Price impact versus bid­ask spreads in the index option market

. We investigate the puzzle of why bid-ask spreads of options are so large by focussing on the price impact component of the spread. We propose a structural vector autoregressive model for trades in the option market to analyze whether they move the underlying price and/or the underlying’s volatility. Our model captures cross-option strategies by pooling order ﬂows across contracts after a decomposition into exposure to the underlying asset and its volatility. While our estimates conﬁrm that S&P500 option trades indeed signiﬁcantly move the underlying and the volatility, the economic magnitudes are very small. Hence, large bid-ask spreads of options remain a puzzle. every put/call pair with identical strike price and identical maturity date and use market mid quotes to solve put-call parity for the unobservable futures prices. At a given time t and for a ﬁxed option maturity T , we average the implied futures prices over all available strikes for which a put and a call price is available to obtain the option implied futures price F t,T . We use linearly-interpolated rates from the OptionMetrics zerocurve ﬁle as a proxy for the risk-free rate of interest. We then average the changes over all short-term futures in our sample to calculate d ut . in We show statistics for the subsamples of in-the-money ( . (0 ∆ 0 . 35) and out-of-the-money (0 . 35 > | ∆ | ) with ∆ denoting the of the option delta. Panel A shows equally weighted observations and Panel B dollar volume weighted observations (price times We deﬁne as the before The eﬀective half-spread is deﬁned as , which then into a p m ) selection component ( m t + − m ) based on a We report the Muravyev-Pearson corrected eﬀective spread (MP), which replaces the quoted midpoint by the predicted value of a regression of the midpoint on Black-Scholes price minus midpoint; delta times lagged underlying price diﬀerences and lagged price changes. All variables are reported in dollars and in percentages as a fraction of the midpoint.


Introduction
An important puzzle in the option market literature is why the trading costs are so large, as represented by bid-ask spreads. These spreads are large both relative to the value of the option, as well as relative to the liquidity of the underlying. For example, in our sample of SPX options written on the S&P 500 index in 2014, the average effective option spread is $0.59 or 1.69% of its value (Table 1). Muravyev and Pearson (2020) study this puzzle and find that equity option effective spreads are 2.2% relative to option value, which holds after a clever adjustment to the midpoint to reflect that trades are more likely to happen on the ask (bid) when the unobserved fundamental value is higher (lower) than the quoted midpoint.
In this paper, we further investigate this puzzle with a thorough analysis of price impact, which is an important component of the bid-ask spread. We propose a novel methodology to estimate price impact that addresses: i) that options have price impact on both the underlying asset and its volatility; and ii) that trading is spread across a large cross-section of sometimes hundreds of options, which are all traded simultaneously, but differ in strike price, expiration date, and option type (put or call).
Our main finding is that the price impacts in the SPX option market are surprisingly small.
We pool the delta component of option trades in the whole cross-section at the hourly level to obtain a net dollar exposure comparable to trading the underlying asset directly, and find that a trade shock of $823 million (one standard deviation in the aggregate SPX option market) increases the underlying by only 5.1 bps. 1 This price impact is truly small relative to the massive order flow shock generating $823 million exposure to the S&P 500. While small, the impact is nevertheless fairly precisely estimated with a standard error of 1.15 bps. The small price impact contrasts the average effective half spread in SPX options, which is more than thirty times larger at 169 bps (see Table 1). We also analyze the impact of option order flow exposure to the volatility generated by their vega, and find that a one standard deviation vega flow shock moves the volatility level by 0.07 percentage points, increasing the sample average volatility (VIX) from 11.50% to 11.57%. 1 We proxy the underlying S&P 500 with the SPY ETF, and find an estimated dollar impact of $0.98, relative to the sample average value of the SPY of $1,930. This impact, too, is very small for such a massive market-wide volatility exposure shock.
How can bid-ask spreads be so large yet the price impact be so small? According to economic theory, there is a strong link between price impact and bid-ask spreads because both reflect frictions due to informed trading and inventory effects. In fact, Back and Baruch (2004) show that the price impact of Kyle (1985) can be mapped directly to the bid-ask spread in Glosten and Milgrom (1985). A large empirical literature decomposes equity bid-ask spreads into these components, together with fixed order processing costs and oligopolistic rents. 2 Our results seem to make the option bid-ask spread puzzle even more puzzling. Indeed, it is unlikely that order processing costs explain the puzzle in the current electronic era. Also, the SPX option market is highly competitive so oligopolistic market maker rents are an unlikely explanation. Further, explanations based on option hedging costs are also unsatisfying as the underlying can be hedged by the SPY ETF, which is the world's most liquid asset.
The main contribution of this paper is a methodological framework that analyzes the price impacts of the order flows of all option contracts jointly. The model first recognizes that any option trade provides exposure, to the underlying asset (through the option delta) and to its volatility (through the option vega). Accordingly, we disentangle the effect of an option trade on the underlying's price or volatility by constructing two order flow exposures which multiply option net order flow (defined as buyer-originated minus seller-originated volume) by the option delta or vega, which we coin "delta order flow" and "vega order flow." These order flows can then be meaningfully aggregated across options with different characteristics. We next relate the two option order flows to changes in the underlying price and its volatility in a vector autoregressive (VAR) model. We further add a fifth equation with the net order flow in the underlying asset, which allows for a direct comparison of price impact in the underlying asset and the aggregate option market. We address the challenge that volatility is unobservable by linking a structural option pricing model to the model-free VIX framework. This allows us to link the unobservable volatility to an observable volatility index, which yields consistent delta and vega estimates and neatly integrates into the VAR system. Thus, our framework extends the seminal work of Has-brouck (1991), who proposes a two-equation VAR model relating order flows in a stock to its price changes.
Our framework yields several advantages over previous studies that typically examine price impact in a single option contract on either the underlying or the volatility. 3 First, by pooling order flow exposures across option contracts, our results represent the magnitude of price impact in the aggregate market. Our results indicate that aggregate price impacts are economically small. In contrast, previous studies typically document the presence of price impact (and therefore informed trading) in a single contract, but the magnitude is not easily translated into an aggregate effect. We effectively apply a data reduction approach by imposing the economic structure from a theoretical option pricing model to summarize the information content in a large number of order flows in just two variables: delta and vega order flow. Second, cross-option strategies are extremely common and may account for more than 75% of volume (Fahlenbrach and Sandås, 2010). We show that the portfolio price impact may be severely reduced when incorporating cross-option price impact, as compared to a naïve approach, which sums the price impacts of the individual option trades. For example, when considering straddle and strangle strategies, we predict a twenty-fold reduction in overall price impact. This reduction is due to the fact that these strategies are close to being delta neutral, and, therefore, the price impact on the underlying largely cancels out. And third, theoretical models with strategic informed traders show that they typically trade in several correlated assets simultaneously (Biais and Hillion, 1994;Boulatov et al., 2012). These theoretical models further motivate our joint analysis of option order flows to measure price impact.
Identifying whether price impact originates in the underlying asset or the volatility process is challenging due to the strong negative contemporaneous correlation between the two processes [known as the leverage effect (Black, 1976)]. This correlation is -0.858 in our sample at the hourly frequency, and we tackle the issue by re-estimating our main VAR model after filtering out the common variation in the two processes. This analysis reveals that while delta and SPY flow predict overall volatility, which is unexpected, this prediction only goes through the leverage effect. That is, delta and SPY flow increase the underlying and therefore indirectly decrease the volatility. Vega flow, however, still predicts variation in volatility after filtering out the variation explained contemporaneously by the underlying.
Our model imposes a strong economic structure on the data through three crucial assumptions, which we validate with several tests. First, using a Wald test, we examine whether the delta and vega price impacts vary across options with different expiration dates or strike prices.
We do not reject equality for vega flows, meaning the vega component of order flows can indeed be pooled across options. The price impact of delta flow does differ significantly across groups of options, and we find it to be higher for options with a short time to maturity and low moneyness.
Second, we test the information loss of our model, which uses a two-factor option pricing model (underlying and volatility) to summarize the price changes across all options with different strike prices and maturities. The analysis, based on Bakshi et al. (2000), indicates that our model explains 99.2% of the variation in individual option price changes. Together, the two tests seem to validate our imposed structure, and provide statistical support for a five-equation system to capture the information content of hundreds of option prices and order flows. And third, we test the assumption of the model stemming from the ordering of the equations in the VAR model. In the robustness section, we consider several different orderings of the equations and confirm that all results hold in alternative model specifications.
We are certainly not the first to study the impact of option order flows on the direction of the underlying or volatility (see Footnote 3). In particular, Bollen and Whaley (2004) run similar regressions of changes in volatility on net buying pressure of ATM call and put options.
Their results, and interpretation, are consistent with inventory models that net order flow has a temporary impact on option implied volatility. This mechanism, together with institutional buying pressure in put index options, is one of the explanations for the volatility smile for example. Gârleanu et al. (2009) formalize this point in a model with risk-averse option market makers who charge (cross-option) price pressures, which in equilibrium are proportional to the unhedgeable component of the option inventory position. These authors do not allow option order flows to move the level of the underlying or volatility, however, which is conceptually important in models of informed trading. Nevertheless, our results do not rule out predictions based on inventory models, because transitory price pressures may exist in the underlying and volatility as well. Rourke (2014) uses a similar price impact VAR model with delta and vega flows estimated using order flows of a single option contract. He uses the returns of a straddle, a portfolio with only vega exposure, to proxy for VIX returns, which makes it difficult to gauge the economic magnitude of the impact of volatility. We contribute to this line of investigation by analyzing order flows in the cross-section of option contracts to address cross-option strategies. We find that cross-option price impact is of first order importance and may explain why the aggregate price impacts turn out to be very small. This paper proceeds as follows. Section 2 introduces how we extend the Hasbrouck (2003) price impact model to take option markets into account. In Section 3 we describe the empirical setup of the analysis and Section 4 presents the main results of the study. Section 5, 6, and 7 give results for higher data frequencies, robustness results, and conclusion, respectively.

A price impact model for option markets
In this section, we presents a novel methodology to measure the price impact of option order flows. We extend the VAR model of Hasbrouck (1991) to option markets and explicitly take into account (i) the cross-option correlations in order flows, (ii) the cross-option price impacts, and (iii) the fact that options can be used to speculate on both the underlying and its volatility.
Consider an asset on which N different option contracts are traded. Individual options are indexed by n = 1, . . . , N according to the following characteristics: strike price, expiration date, and put/call identifier. We recognize that the main determinants of option price changes are the changes in the value of the underlying and its volatility (e.g., Bakshi et al. (1997)). 4 The option price change can therefore be written as: where d o n,t+1 = P n,t+1 −P n,t is the price change of option n at time t+1, ∆ n,t and ν n,t are the delta and vega of the option, respectively, and d u and d v denote changes in the price of the underlying and its volatility, i.e., The error n,t contains terms of order dt of the option pricing model. We define vega as the first partial derivative of the option price with respect to volatility. One of the key challenges is that volatility is not directly observable, and therefore needs to be estimated from market data. We provide further details on the link between volatility and observable volatility indices in Subsection 3.2.
We disaggregate an option trade into its exposure to the underlying asset (through the option delta) and to its volatility (through the option vega). The key advantage of this linear transformation is that option order flows are now expressed in the same units, and can be meaningfully aggregated across options. We define the time-t aggregate net dollar exposure to the underlying by x ∆ t and to the volatility by x ν t : We sum over all option trades (indexed by i) in the fixed time interval between t − 1 and t. With a slight abuse of notation, we denote by ∆ i,t and ν i,t the delta and vega of the particular option trade i prevailing at the start of interval (time t − 1). 6 Further, Q i is the trade volume in number of options, and BuySell i a binary variable that equals 1 for a buyer originated trade and −1 for a seller originated trade. In the remainder of the paper, x ∆ t and x ν t are called delta order flow and vega order flow, respectively. The delta order flow is denoted in U.S. dollars, which allows for a meaningful comparison to the order flow in the underlying asset.
In the next step, we relate the delta and vega order flows to the price changes of the underlying (d u t ) and volatility (d v t ) in a VAR framework. We include the dollar order flow of the SPY ETF (denoted x spy t ) as an additional equation, as it is a proxy for the trading volume in the underlying asset. This approach also accounts for trading strategies that involve simultaneous trading in options and the underlying. Note that the order flow x spy t is measured in the same unit as x ∆ t , as both represent dollar order flow exposure to the underlying. The difference, however, is that the latter is constructed from option order flows only.
The empirical model can be written as follows: where A i,l , ..., E i,l are constant coefficients. Equation (4) emphasizes the ordering of the equations in the structural VAR. This ordering identifies the orthogonal structural innovations, as it restricts the contemporaneous innovation in one variable to affect that of another, but not the reverse. 7 In particular, in the first equation of the system, price changes d u t are affected by all other variables contemporaneously but do not affect any other variables. In the last equation, the vega order flow x ν t contemporaneously affects all other variables but is not affected by others.
While in general the ordering is arbitrary, we believe that the model in equation (4) is the most natural choice. First, we follow Hasbrouck (1991) by placing price changes first and order flows second. This is motivated by sequential trade models, where order flows cause returns because of asymmetric information and informed trading, whereas returns do not directly cause order flows. With regard to the ordering of the price and volatility changes d u and d v , we note that the former is a traded asset whereas the latter (the volatility) is not. Being a traded asset, d u should respond fast and efficiently to new information, such that it is affected by contemporaneous innovations in volatility. The reverse does not hold, since volatility moves slower as it is only indirectly traded through a wide range of options. We also believe our model represents the most conservative ordering. Our focus is on measuring the information content in option order flows after controlling for any information captured by trading in the underlying itself. Accordingly, our proposed ordering attributes any contemporaneous information captured in, for example, both x spy t and x ∆ t to the former when predicting d u t . While this ordering may create a bias by reducing the explanatory power of x ∆ on d u , it is the most conservative approach for our purposes. A number of alternative specifications are studied in the robustness section.
The proposed model yields several practical advantages. First, the lags in the VAR system naturally account for any autocorrelation in the volatility process. Second, with separate equations for the underlying and volatility, we obtain price impact estimates in both components. Third, the analysis of option markets typically requires a data reduction technique as the number of actively traded options is large (considering put and call options, as well as different strike prices and expiration dates). 8 To reduce the dimensionality of the data, some papers suggest modeling option prices as a function of time to maturity and moneyness (e.g., Aït-Sahalia and Lo (2000)).
In this paper, we apply a data reduction motivated by the two-factor option pricing model of Heston (1993).

Data
Our trade and quote data are from the Refinitiv Tick History database. 9 The data set consists of European-style SPX options written on the S&P 500 Index and covers the time period from January 2, 2014 to December 31, 2014. Option contracts differ in strike price, put/call identifier, and expiration date and are part of the third-Friday expiration cycle, which includes the most liquid option contracts written on the S&P 500 Index. For each option trade, we observe the exact time of the trade (to the millisecond), as well as the trade price and volume. Quote data consist of the best bid and ask prices, which are used to classify each trade as a market buy or sell order. 10 We also collect corresponding trade and quote data for the SPY ETF, which we use as a tradeable proxy for the underlying asset and to construct the price changes d u t .
We apply a range of standard filters to our data. First, we discard options with less than seven or more than 180 calendar days to maturity. Long-term options are very infrequently traded, and short-term options are adversely affected by changes in the expiration cycle. Second, we remove individual trades for which the dollar volume exceeds $10 million, as these are likely negotiated over the counter (OTC). 11 And third, we avoid opening and closing auctions by restricting our analysis to trades between 8:45 am to 2:45 pm CT, leaving a sample of six full trading hours.

Model design choices and implementation
The empirical implementation of the proposed model requires several design choices, such as the sample frequency and the construction of the order flow variables x ∆ t and x ν t , as well as volatility changes d v t . We construct these as follows.
Sampling frequency: We sample data at the hourly frequency. While our data set and methodological framework allows us to conduct the empirical analysis at even higher frequencies, 9 The database used to operate under the name Thomson Reuters Tick History database. 10 There have been some issues with signing option trades using Lee-Ready because trades can occur within the quoted prices and are often timed strategically (see Muravyev and Pearson (2020)). This adds measurement error to the order flow variables, but will otherwise not affect the analyses. 11 This filter drops less than 0.09% of the trades. a careful analysis in Section 5 shows that the hourly frequency provides the best compromise between a high-frequency analysis and avoiding problems related to microstructure noise in option prices. That is, option prices are characterized by relatively large bid-ask spreads and tick sizes, which imply that prices adjust infrequently-only after sufficiently large changes in the underlying or volatility. Further, the volatility process is not directly observed and must be estimated, and the resulting measurement error becomes problematic at too high sample frequencies.
Delta and vega: A second important design choice relates to the calculation of trade deltas and vegas, which are required for the order flow variables in equations (2) and (3). We use the smile-consistent option pricing model of Heston (1993) to calculate deltas and vegas for each individual option trade. Our methodology can be summarized as follows. First, we use the theoretical functional relationship between the (observable) VIX index and the (unobservable) spot volatility to simultaneously calibrate the Heston model to intra-daily option data and the VIX index on the first day of our sample. Our approach avoids filtering techniques and highdimensional optimization (Broadie et al. (2007), Christoffersen et al. (2010)) and is easy to extend to other pricing models or option markets. Second, we use the calibrated parameters from the first day of the sample and VIX index values on the next trading day to obtain the out-of-sample S&P 500 spot volatility. Third, equipped with estimates of the spot volatility and the structural parameters of the model, we then calculate deltas and vegas for all option trades on the second day using standard Fourier inversion techniques. The procedure is then repeated for all trading days in our sample. Our methodology allows for a conceptually easy construction of the trading exposure to the underlying S&P 500 (x ∆ ) and the S&P 500 volatility (x ν ) using the definitions in equations (2) and (3). We provide a more detailed description of our methodology in the Appendix.
Volatility process: We use the VIX index values to estimate the volatility process of the S&P 500 at the one minute frequency. 12 We follow a standard procedure, and define the time-t VIX index with maturity τ by: where O(t, K, T ) denotes the time-t quoted midpoint of an out-of-the-money (OTM) option (with strike K and maturity T ). 13 To approximate the integral in equation (5), we first construct an option pricing function that is continuous in the strike K. We follow previous studies and interpolate the implied volatilities of OTM options by a cubic polynomial and extrapolate the curve by fixing the implied volatilities of options beyond the traded strike range to the nearest available market implied volatility [for these procedures, see Broadie et al. (2007) and Carr and Wu (2009)]. We then use a simple adaptive Gauss-Kronrod quadrature method to calculate the integral in equation (5). In the Appendix, we describe the link between the VIX and spot volatility in more detail.

Summary statistics
Option bid-ask spreads: We first show implicit trading costs in terms of option bid-ask spreads and next compare these to estimated price impacts. Table 1 contains detailed spread results for our sample (including those reported in Table 1). The nature of option markets is such that percentage and dollar spreads are not easily comparable between options with different degrees of moneyness.
In particular, deep out-of-the-money options have a low value and therefore the percentage spread is typically large compared to its dollar spread. The opposite holds for deep in-the-money options.
Further, we show both value-weighted and equal-weighted spreads, because the former overweights high-priced ITM options and the latter overweights low-priced OTM options.
The equal-weighted average effective quoted half spread, i.e., the quoted half-spread just before each trade, is 12.8%, while the effective spread is only 5.6%. The large difference is 13 Moneyness is defined relative to the forward price of the underlying. explained by the many trades negotiated off-exchange or directly with dealers that occur inside the quoted bid and ask prices. The dollar value-weighted percentage spreads are about onethird of the equal-weighted values, reflecting that high-priced ITM options have relatively small percentage spreads. We further see that the sample average spread reduces to 5.58% after applying the correction of Muravyev and Pearson (2020), who show that the use of the midpoint in the effective spread calculation is inappropriate when investors are more likely to buy (sell) when the unobserved fundamental is closer to the ask (bid). 14 The correction does not change results as much for SPX options as it does for equity options, likely because many SPX trades already occur within the quoted bid and ask prices.
The table also shows the effective spread in the ten-or thirty-minute realized spread and adverse selection components. The equal-weighted adverse selection component is large, 3.12% at the ten-minute level, compared to a realized spread of 2.53%. In contrast, the dollar volumeweighted average is only 0.05%, compared to the realized spread of 1.66%. This difference is mainly caused by deep OTM option trades, which have low dollar volumes and relatively large adverse selection components. the underlying and volatility. Column (1) also reveals that out-of-the-money options are traded relatively frequently, as the average moneyness (0.945) is less than one. Moneyness is defined as F t,T /K for puts and K/F t,T for calls where F t,T denotes the time-t forward price of the underlying.

Individual option trades:
The average time to maturity is 0.118 years, or about 30 trading days.
The remaining columns in Table 3 report summary statistics for nine subsets of the data, double sorted by time to maturity (ttm) and moneyness. We create three buckets by ttm using the cutoff values of 30 and 90 days. We use cutoff values of 0.95 and 1.0 to group options into either deep out-of-the money, out-of-the-money or in-the money. While these cutoff values are arbitrary, this choice ensures that each subset includes a reasonable number of trades. The same subsamples are used throughout the remainder of the analyses. Each subset can be identified in the table by the corresponding average moneyness and ttm. As expected, the subgroups that contain near-the-money options yield the highest exposure to volatility (those in columns (3), (6), and (9)). Similarly, the subsets with highest moneyness, in columns (4), (7), and (10), yield the highest exposure to the underlying. 15 Aggregated order flow variables: Table 4 provides tthe summary statistics of all variables used in the estimation of equation (4) The risk that changes hands through a one standard deviation delta order flow trade is about twice as large as that of a vega order flow trade. To see this, note that the standard deviation of delta order flow is much larger than the standard deviation of vega order flow ($864M vs. $16M).
However, as a pricing component, the volatility returns are much riskier than the underlying returns. 18 Together, the hourly risk transfer of a one standard deviation delta flow shock for one hour is 864 × 0.2% = $1.75 million, while that of a vega order flow trade is 16 × 5% = $0.8 million.

Price impact model
We estimate the VAR model in equation (4) with two lags as suggested by the Akaike information criterion. Following Hasbrouck (1991), we assume the trading process restarts at the beginning of each trading day, and therefore we set all initial lagged values to zero. Table 5 provides the long run impulse response functions (IRFs) of the model. The table contains the five-by-five matrix of IRFs, which shows the impact of an impulse in one variable (in columns) on the cumulative impact on all other variables (in rows). We allow each shock to iterate through the system for four periods. The table also shows the size of each shock in the first column. Table 5 offers two main results. First, column (6) shows that a one standard deviation shock in x ν increases d v by 0.07. This translates into a volatility increase from the sample average of 11.5% to 11.57%, and corresponds to 0.14 standard deviations of d v . This 17 The hourly standard deviation of $864 delta flow can be constructed from the individual option trade data in Table 3 as follows. The average option trade has a delta flow of about $2.5 million (a delta of 0.25 times 51 contracts times a 100 multiplier, times 1,930 (value underlying)). A given trading hour has on average about a 1,000 trades, so if the hourly standard deviation of net flow (buys minus sells) is about 40%, then the standard deviation of hourly delta flow is $1,000 million (40% of 1,000 times 2.5 million), which is fairly close to the reported standard deviation of $823 million. 18 The annualized volatility level is on average 11.5% and its difference has an hourly standard deviation of 0.58 percentage points, which gives a volatility return standard deviation of 5.0% = 0.58/11.5 × 100. The underlying has a spot price of $1,930 on average and its difference has a standard deviation of $3.92 per hour, giving a return standard deviation of 0.2% = 3.92/1, 930 × 100. magnitude can be interpreted as the price impact of volatility speculation, and is economically relevant despite very low levels of volatility in 2014. However, this impact results from a one standard deviation vega-flow shock based on the aggregate SPX option market, which is a massive transfer of risk with an hourly return standard deviation of $0.8 million (see Footnote 18). Related studiest typically analyze the volatility impact of trading in one or two options (e.g., Bollen and Whaley (2004); Rourke (2013)), but these results are not representative of informed trading in the general market and may be biased by not adjusting for cross-option trading strategies. 19 The remaining variation in volatility is driven by public information arrival and the other order flows.

Main results
The second main result is shown in column (5), where the long-run impact of a shock in x ∆ on d u is 98.13 cents. This can be interpreted as a change in the average value of the underlying from $1,930 to $1,930.98, which is an increase of 5.1 bps. This price impact is very small compared to the average option effective spread of 169 bps. Further, even though $0.98 represents about 25% of the hourly standard deviation ($3.92) of changes in the underlying, note that it requires a massive order flow exposure shock of $823 million (Table 5). 20 This makes the dollar price impact very small. We do note that these low price impacts in SPX options are consistent with the low but positive price impacts for equity options in Muravyev (2016). Interestingly, the impact of an order flow shock in the ETF (x spy ) is larger with $1.71 per standard deviation (see column (4)).
Also note that the size of ETF shocks is much smaller than the size of delta flow shocks: $420 million vs. $823 million. Within the context of our model, this would suggest that the per-dollar price impact of an ETF shock is 3.4 times larger than when the same exposure is obtained through options. Of course, our model does not include order flows in other (near-perfectly) correlated assets like the E-mini futures, and thus does not control for any correlation with those order flows. Table 5 provides several additional findings. The impact of x spy on d v is negative and large, and in Subsection 4.3 we show this is fully explained by the leverage effect (i.e., the negative correlation between d v and d u ). The mechanism is that x spy increases d u and simultaneously decreases d v ; this contemporaneous correlation cannot be disentangled by the VAR model. The same argument explains why x ∆ negatively affects d v , and x ν negatively affects d u . 21 Further, the coefficients on the diagonal of Table 5 reveal the long-term impact of a shock on the variable itself.
All variables are positively autocorrelated, because the long-term impacts are greater than the size of the structural shocks (shown in the first column). In addition, x ∆ and x ν are uncorrelated in the long-run as a shock in one does not affect the other. Indeed, x ν and x ∆ mechanically have a positive correlation for call option trades and a negative correlation for put option trades -on average, the two opposing effects seem to cancel out. Lastly, ETF flow x spy significantly causes delta flow in the long-run, but not the other way around. This suggests that x spy is leading and delta flow is following.  (2018)).

Economic implications
Individual and option portfolio price impact. An option trade affects the underlying and volatility through its delta and vega flow, which in turn affects the option price because it linearly depends on the underlying and volatility. We now calculate these option price impacts using the estimated VAR results for several hypothetical trades in individual options and option portfolios.
These price impacts are a component of total trading costs and can be meaningfully compared to the bid-ask spreads of Table 1. Table 7 shows the numerical results for hypothetical portfolio trades based on data of June 27, 2014 at 12:00 the middle of our sample period). The first rows correspond to a straddle, with a long ATM call and put. We consider a very large trade of 1,000 contracts [about 20 times the sample average-see column (6) in Table 3], where each contract has a multiplier of 100, at a price of $16.88 (call) and $16.90 (put) per option. The dollar cost is $1.69 million for each leg, but due to the embedded leverage, the delta flow is $124 million for the call and $-74 million for the put. 23 These massive exposures generate only a tiny price impact on the underlying of $0.104 and $-0.131 dollars for the call and put, respectively. 24 In basis points, these values are only 0.53 and -0.67.
We next convert these impacts on the underlying and volatility to obtain the price impact in the option. Column (10) shows the price impact of the individual option trades-thus ignoring cross-option price impact of the straddle-and these are 29.9 bps for the call, 42.1 bps for the put, yielding a value weighted portfolio average of 36 bps. As the price impact is linear, the impact for the average trade size is about 20 times smaller. The main result is column (12), which shows that the portfolio cost of 36 bps reduces to only 1.5 bps after the cross asset price impact is accounted for. This twenty-fold reduction is a consequence of the straddle being close to delta neutral, meaning that the price impact on the underlying of the two option trades largely cancels 23 A single ATM call option, at a price of only $16.88, offers a linearized exposure of about $1,200 to the underlying, because its delta is 0.63 relative to the value of the SPY ETF of $1,954. 24 The put option has a larger price impact on the underlying because of the leverage effect: its vega exposure increases volatility, which in turn reduces the underlying. The reverse holds for the call option, where the positive impact of delta flow on the underlying gets partially reversed through the options increase on volatility.

out.
The results for the strangle, the second panel in Table 7, show a similarly large reduction in portfolio costs due to cross-option effects: the portfolio price impact is 0.3 bps instead of 16 bps when the cross-option impact is ignored. For put and call spreads the cross-option effects are still relevant, but less so. The reason is that for these strategies the delta and vega flows do not cancel out as much, meaning the portfolio trade still generates a significant impact on the underlying and the volatility.
Summarizing, the first result is that even the massive trades we consider in individual ATM options have a price impact of at most 42 bps, which is small relative to the average ATM option bid-ask spreads of 152 bps (Table 1, column (5)). Second, when looking at option portfolio trades, this price impact may shrink significantly depending on the extent that exposures to the underlying and volatility cancel out. The implication is that price impact costs of individual option trades are low, but still may severely overestimate the price impact costs of portfolio trades.
Learning and informed trading. To the extent that the estimated price impacts are permanent, the results are consistent with standard theories of informed trading (e.g., Kyle (1985); Glosten and Milgrom (1985)), with the extension that some investors are endowed with private information on the underlying or its volatility. They trade options to speculate on both information signals. While the market does not have this private information, it is aware of the general presence of informed traders, and rationally uses the observed order flows to update beliefs about the fundamental value and its volatility.
The estimated price impacts of delta and vega flow reflect this updating (or learning) process.
Under this interpretation, we have estimated the price impact of volatility speculation, which is novel: a standard deviation shock to vega order flow ($16 million) increases the volatility by 0.14 standard deviations (0.07 percentage points). This price impact parameter can also be interpreted as the illiquidity of volatility speculation. The low volatility price impact we find means that volatility-related information is highly valuable because it can be exploited without moving the price much.
Price pressures. The price impacts may also reflect slow-moving transitory price pressures, that is, the analysis will capture temporary price changes that need more than three hours to die out. Under this interpretation, the results are consistent with inventory models of risk-averse market makers (see e.g., Ho and Stoll (1981)). For example, a market maker may hedge a trade in a given option by trading other options or the underlying, and accordingly create price pressures in those assets. If we assume that option prices depend only on changes in the underlying and volatility, then each market maker will decompose an option inventory position into a delta and vega exposure and calculate its risk position in the exact same fashion as in our model. In reality, option prices depend on other factors as well, including some noise, but our model should offer a good description of market maker inventory risks.
Market segmentation. Lastly, our finding that ETF order flow has a 3.4 times higher per dollar price impact than delta flow suggests a degree of market segmentation between SPX option markets and the SPY ETF. Option trades are much larger in terms of dollar volume and risk, which suggests that investors who prefer to trade in size use options. In contrast, the ETF market attracts relatively small but informed trades.

The common component in the underlying and the volatility process
The main analysis shows that delta and SPY order flow affect the volatility process, which is surprising because one would expect that volatility is not affected by order flow exposures to the underlying. We now investigate whether these results can be explained by the leverage effect, i.e., the strong negative contemporaneous correlation between the underlying and the volatility process (Black, 1976). The Heston model specifically recognizes this correlation, but does not argue in which way the causality goes: Is it a shock from volatility to the underlying, or vice versa? While the causality does not matter for the pricing of an option, it is important to know where information originates when analyzing informed trading.
In this subsection, we take a simple approach to account for the leverage effect. We decompose the time series d u t and d v t into two components: the variation explained by the other variable and a residual, which we identify through the regressions: These equations are simultaneously determined. The predicted values of each captures the variation that is common in both variables, which we call d c,u t and d c,v t . We interpret these as the variation in d u t or d v t explained by the leverage effect. 25 The residuals, then, capture the variation that is not caused by the leverage effect, which we call d r,u t and d r,v t .
Accordingly, we estimate the VAR model of equation (4) (1) and (2) show the VAR specification using the common variations, d c,u and d c,v . We see that x ν , x ∆ , and x spy all significantly predict d c,v . In column (2), x ∆ and x spy no longer significantly predict d c,u , because all their explanatory power has already been captured through the contemporaneous regressor d c,v in that equation. To see this, we repeat regression (2) but omit the contemporaneous d c,v , and in this case x ∆ and x spy do turn significant, as shown in column (3). In fact, their coefficients are more than twice as large as those in column (5) of Table 6, because in that regression we controlled for d v , which is a noisy proxy of the leverage effect.
Columns (4) and (5) show the results using the residual variations, d r,u and d r,v . In this case, x ∆ and x spy do not predict variation in volatility (column (4)), but do predict changes in the underlying (column (5)).

Testing the imposed structure
In this Subsection we evaluate two crucial assumptions our structural model imposes on the data. First, we test whether the price impacts of delta and vega flow are the same when the order flow variables are constructed from trades in different option subsets. Second, we investigate whether summarizing the cross-section of option price changes with a two-dimensional process leads to a significant loss in information. Information loss using a two-dimensional process: The five equation VAR model is designed to summarize the information captured in the large state space of options prices and order flows across strike prices, expiration dates, and puts and calls. To evaluate the performance of this data reduction technique, we follow Bakshi et al. (2000), and test the model in equation (1) by regressing the hourly price change of individual options, with characteristics n={Strike, Expiration date, Put-call identifier}, on the price and volatility changes according to the following specification: where d o n,t is the change in the mid quote of option n, and the terms ∆ n,t d s t and ν n,t d v t represent the options exposure (delta or vega) multiplied by the factor (change in underlying price or change in volatility). This analysis uses data of all individual options (one observation per option-dayhour), whereas the VAR used one observation per day-hour. For consistency with the VAR model, we include two lags in the regression. If the structure imposed by a two-factor option pricing model is correct, the regression should yield a) an R-Squared of one, and b) coefficients β 0 = 0, 2 l=0 β l,1 = 1, and 2 l=0 β l,2 = 1. Indeed, in this theoretical case the empirical specification would perfectly explain the actual changes in option prices. Table 9 provides our estimation results. Column (1) shows the full sample regression results.
We obtain an R-squared of 99.2%, which suggests that our model does an excellent job at summarizing the information content in option price changes. We also find that 2 l=0 β l,1 = 0.997 and 2 l=0 β l,2 = 0.899, which both are close to one from an economical point of view. This also confirms that option returns are strongly affected by changes in volatility, and that options are useful assets to speculate on changes in volatility. The t-statistic on coefficient ∆ × d s is extremely large (5,551) due to the high R-Squared and the sheer size of the data (851,051 observations). For this reason, we do reject equality to one for both 2 l=0 β l,1 and 2 l=0 β l,2 (the t-statistics are 14.6 and 37.3, respectively). This is easily explained by a small model misspecification.
Column (1) further shows that the lagged coefficients are smaller, but still statistically significant. This motivates the use of the lags in the system. Compared to Bakshi et al. (2000), the two-factor model works much better with more recent data because markets have become more efficient. Using SPX option data from 1994, they find coefficients of β 1 = 0.80, β 2 = 0.41, and an R-squared of 59%.
We repeat the exercise for the nine subsets of options sorted by ttm and moneyness. In general, the model works very well. We see it performs slightly weaker for options with a low moneyness and short ttm (column (2)). In this case, the 2 l=0 β l,1 = 0.612 and 2 l=0 β l,2 = 1.35.
These coefficients are likely affected by the model misspecification of the Heston model, which, for example, does not allow for discontinuous jump moves in the underlying price equation. In particular, it has been shown that for pricing short ttm options modeling a jump component yields better pricing performance compared to a pure stochastic volatility model (see Eraker (2004)).
Further, this subset contains options with very low prices, and we know that the microstructure noise is more severe here as tick sizes are relatively larger. The model works better for the remaining columns, which all have an R-squared exceeding 92%.
From an economical standpoint, the imposed structure fits the data well. However, the rejection of equality to one for 2 l=0 β l,1 and 2 l=0 β l,2 means that actual option price changes differ from what our model predicts. This implies that either the deltas and vegas contain errors; or options price changes have transitory components; or that the true data generating process for option prices contains additional factors (see, e.g., Christoffersen et al. (2009) or Bardgett et al.
(2019)). As a consequence, there is some bias in the delta and vega order flows used in the VAR model. This issue could be tackled by using a more advanced option pricing model, but this is beyond the scope of this paper. However, we see no economic channel how any misspecification of the two-factor structure would alter the findings in the VAR. In fact, any misspecification would most likely bias the results against finding price impacts.

Higher frequencies
We have shown that at the one-hour frequency, delta and vega order flows predict changes in the underlying and volatility. A limitation of the identification in the VAR model is the restriction that one endogenous variable can affect another contemporaneously, but that the latter cannot affect the former. This assumption seems tenuous at the hourly frequency, and in this section we investigate whether we can extend the analysis to the half-hour and one-minute frequency.
Estimating the VAR model for higher frequencies such as one-minute intervals poses a nontrivial challenge since a much larger number of lags needs to be added to the VAR equations to cover a comparable time horizon as for the one-hour frequency. A large number of lags implies a large number of parameters, which in turn makes estimation procedures unstable. To deal with this issue, we follow the procedure proposed in Hasbrouck (2019) tions have large tick sizes and bid-ask spreads, which prevents prices to adjust smoothly from adjusting changes in the underlying asset or the volatility. Only at sufficiently low frequencies are the changes in the two factors large enough (relative to bid-ask spreads) to induce option price changes.
We conduct two analyses to reveal the effect of microstructure noise on the estimates. First, Table 10 shows that the bid-ask spreads and tick sizes of options are large compared to the standard deviation of option price changes at high frequency. The tick size ranges from $0.05 to $0.10 (depending on the option price level). 27 The bid-ask spreads however, calculated as the median across option categorized into eight subsets by option price level, range from $0.35 for options priced under a dollar to $1.56 for options priced between $20 and $40. This means the spreads are typically between 7 and 15 ticks. The spread values are similar to the standard deviation of price changes at the hourly level, which range from $0.14 to $1.45 for the same subsets of options. At the one-minute frequency, however, the standard deviations of price changes range between $0.05 and $0.17, which is about 7 to 10 times smaller than the bid-ask spread. This implies that at the one-minute level, the changes in the two price factors are so small that the implied option price changes often fall within the bid and ask quotes and do not update frequently.
The direct consequence of the microstructure noise is that the two-factor structure we impose has weak explanatory power at the highest frequencies. We repeat the analysis of Subsection 4.4 at the one-minute frequency, and find in Table 11 that the two factors explain no more than 47.9% of the variation in the full sample. Moreover, the coefficients on the terms ∆d s and νd v (summed over the lags) lie much further away from 1. These issues are worse for the subsets of the options with low moneyness (columns (2), (5) and (8)), where the R-squared is only 42.1%, 28.8%, and 19.1%, respectively. The two-factor structure is in essence a data reduction technique that summarizes the information in many option prices and order flows by a handful of components.
This structure appears to break down at the highest frequencies.
The poor performance of the two-factor model at the highest frequency also implies weaker results of the VAR model. Measurement error caused by microstructure noise directly biases coefficients towards zero. Further, as quoted option prices do not update frequently, they will not quickly reflect order flow information. In turn, the volatility process itself is extracted from these option prices, which occurs more slowly. An additional issue is that the microstructure noise makes the leverage effect more problematic: the noise makes it more difficult to distinguish between the impact of order flows on the underlying and the volatility. Given that vega flow is ordered last in the VAR, any contemporaneous correlation between vega flow and SPY flow (or vega flow and delta flow) will be attributed to the latter, leaving less explanatory power for vega flow. This effect can be seen by comparing the minute-frequency analysis in Figure 5, where, compared to the hourly analysis in Figure 1, the x spy gets a stronger predictive power at the expense of x ν and x ∆ . Overall, we find that the hourly frequency we use is the best compromise between a high-frequency analysis and avoiding the adverse effects of market microstructure noise in option data.

Robustness analyses
We performed a number of robustness analyses, which we summarize in this section. Detailed results are in the Online Appendix.
Ordering of the equations. An important but somewhat arbitrary choice is the ordering of the equations in the structural VAR. This holds particularly at the hourly frequency, where essentially all series occur simultaneously. The ordering determines the direction of the contemporaneous correlation between two variables: only one is allowed to affect the other, but not the reverse. Without this structure, the system is not identified. The ordering in turn determines the structural innovations, the regression coefficients, and the impulse response functions. While five equations allow for 120 possible combinations, we note that for the price impact results, the order of d u and d v does not matter since they both appear after the order imbalance variables.
There are two alternative orderings that deserve further attention. First, we estimate a version of our model where we put x spy before x ∆ and x ν (see Table 2, Panel B, in the Internet Appendix).
In this case, the information contemporaneously captured in x spy and x ∆ is attributed to the latter. Accordingly, the coefficient of x ∆ on d u is estimated more precisely (a higher t-stat). The long-run effects however are similar to the main specification, both in terms of economic magnitudes and significance levels. Second, in Panel C we consider a alternative ordering x spy , x ∆ , x ν , that switches delta and vega flow compared to Panel B. This change appears to have a negligible impact on the estimated coefficients.
The forward price. In the setup of our main analysis, we use the ETF SPY price to proxy for the value of the underlying asset. An alternative is to extract the forward price from the option data, which is noisier but not confounded by dividend yields paid before option expiration. 28 We re-estimate the VAR using the forward price as the underlying instead and find nearly identical results (see Figure A3, in the Internet Appendix). This is reassuring, yet we prefer the main specification as the ETF price typically leads in price discovery (Hasbrouck, 2003). At the onehour frequency, however, this channel is of no concern.
Price changes versus returns. We estimate the VAR with the return on the underlying and volatility, instead of price differences. After appropriately scaling the coefficients, the results are virtually identical to those in the main specification (see Figure A4, in the Internet Appendix).
We chose to report the version with price differences, as it corresponds more naturally to the analysis in Subsection 4.4. Limitations: The analysis is based on data of SPX options and the SPY ETF. In reality, 28 We back out the underlying price process from option prices as follows. For each t on a one-minute grid, we collect every put/call pair with identical strike price and identical maturity date and use market mid quotes to solve put-call parity for the unobservable futures prices. At a given time t and for a fixed option maturity T , we average the implied futures prices over all available strikes for which a put and a call price is available to obtain the option implied futures price Ft,T . We use linearly-interpolated rates from the OptionMetrics zerocurve file as a proxy for the risk-free rate of interest. We then average the changes over all short-term futures in our sample to calculate d u t .
traders can trade other investment vehicles to obtain exposure to the S&P 500 (through the S&P 500 future for example) or its volatility (through options on the VIX or options on the SPY for example). We do not have these data and thus are unable to investigate to which extent investors use such sources to trade on their information. Nonetheless, the main contribution of the paper is the model, which is flexible and allows for easy integration of other order flow sources. One approach is to calculate the delta and vega order flows in these additional assets and sum them with the current variables. An alternative is to incorporate separate equations for these sources, which allows for tests of differences of coefficients to determine which order flows have higher price impacts.

Conclusion
We offer a novel framework to estimate the price impact of the aggregate option market. To our knowledge, we are the first to propose a single model that captures trading in potentially hundreds of options, with differing strike prices, expiration dates, and types (put or call). We impose economic structure on the data using theoretical option pricing models, which takes into account the high cross-option correlations in returns, order flows, and liquidity. We are also the first to disentangle the leverage effect when studying informed trading in the underlying and the volatility.
Our main result is that SPX option price impacts on the underlying and the volatility are small, especially compared to effective spreads. For a typical at-the-money call trade, the price impact is less than 4 bps, compared to an effective spread of 169 bps. Further, if we analyze portfolio trades, rather than individual option trades, price impacts can easily shrink tenfold depending on the extent that the delta and vega order flow exposures cancel out.
This raises an important question: Why are option spreads so large compared to the very small price impact? Traditional explanations suggest that spreads depend on asymmetric information, market maker inventory effects, fixed order processing costs, and market maker rents. But the first two frictions also cause a price impact, which according to our results seem small. We believe it is unlikely that the latter two frictions are large for the highly competitive and liquid SPY ETF market. As such, the large observed spreads become even more puzzling.

Appendix Heston Model Details
Model definition. The Heston (1993) model has become the most important benchmark in the option pricing literature. Its main theoretical advantage stems from the mathematical tractability of its characteristic function (see Duffie et al. (2000)). Under the model assumptions, the S&P 500 index S and its volatility v are described by the following stochastic differential equations under the risk-neutral measure Q: where W 1 and W 2 are standard Brownian motions under Q. Structural parameters are given by κ Q > 0, θ Q > 0, σ > 0, and ρ ∈ [−1, 1]. The risk-free rate is denoted r and q is the continuous dividend yield.
Due to the affine model structure, European call and put prices can be calculated by standard Fourier inversion methods (see below) and we denote P (t, S t , v t , K, T, ω) as the time-t price of a put or call option (ω = 1 for a call, ω = −1 for a put) with strike K and maturity T > t. We collect all structural parameters and the risk-free rate in the vector Θ but drop the dependence of option prices on the parameter vector for notational simplicity. It follows from Ito's lemma that: Due to the analytical tractability of the option pricing function P , the first partial derivatives with respect to S and v can also be calculated analytically and we denote these as: To relate changes in option prices to observable variables, we exploit the theoretical relation between the spot volatility and the VIX index. It is straightforward to show that under our model assumptions, the squared VIX index is a linear function of spot variance, i.e., V IX 2 where a and b are functions of the structural model parameters Θ and the maturity of the options used to construct the VIX index (denoted τ , i.e., 30 days for the standard index published by CBOE). Using these theoretical results, given a parameter set Θ, option delta and vega can also be calculated by Fourier inversion.
Option pricing formulae. Duffie et al. (2000) show that the generalized characteristic function for affine jump-diffusion processes is exponentially affine in the state variables log S t and v 2 t . For the Heston model, the logarithm of the characteristic function under the risk-neutral measure Q is given by: where A and B are complex-valued functions, i = √ −1 and E Q [ ·| F t ] denotes the risk-neutral F tconditional expectation. For expositional clarity, the dependence of all functions on the parameter set of the model Θ is suppressed. Following Bates (2006), the price of a European call option is given by: where F t,T = e (r−q)(T −t) S t and denote the real part of a complex number. Partial derivatives of the call price are given by: The theoretical value of the VIX index with maturity τ v = 30/365 can be recovered from the characteristic function. One can show that for the Heston model: Empirical estimation. In order to compute the delta and vega for all option transactions in our dataset, we need to calibrate the Heston model to market data. There is no standard methodology in the literature regarding the estimation of model parameters of option pricing models. We adopt a simple calibration procedure and estimate model parameters for every trading day in our sample using intradaily option quotes. To this end, we minimize the mean root square error of all available OTM options with maturities from 7 to 180 days as follows: where N t is the number of available option quotes on day t, t i are set to minutely intervals from 9:45 am to 15:45 pm (on day t) and P ma (t i , K i , T i , ω i ) denotes the market mid-quote of option i. We use the shortest option maturity to invert the theoretical VIX formula above to obtain v t . To limit the number of option contracts in our calibration, we restrict the calibration to the most liquid contracts, OTM contracts, and contracts with short to medium maturity. For the calculation of deltas and vegas, we use calibrated parameters from the previous trading day.
Our calibration procedure has two distinct features. First, our methodology circumvents the problem of filtering the latent variance process, as we calibrate the model to both option prices and the VIX index simultaneously. Alternative approaches such as those in Broadie et al. (2007) or Christoffersen et al. (2010) rely on filtering techniques or the calibration of v t , which leads to additional complexity in the calibration algorithm. And second, our recalibration allows us to be robust to changing market dynamics. While the Heston model may be rejected because of the imposed structure (see for instance Broadie et al. (2007) or Christoffersen et al. (2010)), our recalibration ensures that empirical results are not particularly sensitive to possible model misspecification. To calculate deltas and vegas for option transactions, we use parameters estimated using option data on the previous day, so our procedure is out-of-sample. While parameters in the calibration may change over time, our assumption is that the Heston model provides a reasonable way of separating price from volatility risk using model calibrations from the previous trading day. Following the empirical set-up in Bakshi et al. (2000), we provide strong empirical evidence in Table 9 that this approach covers more than 98% of the variation in option prices in our sample. Hence, Equation (A.3) in combination with our Heston model implementation provides a highly accurate description of option market prices, and offers the main benefit of a reduction of variables to only price and volatility risk. We define the effective quoted half-spread as the bid-ask spread prevailing just before a trade. The effective half-spread is defined as (p t − m t )d t , which is then decomposed into a realized spread (p t − m t+τ )d t and adverse selection component (m t+τ − m t )d t based on a midpoint price τ minutes later. We also report the Muravyev-Pearson corrected effective spread (MP), which replaces the midpoint by the predicted value of a regression of the midpoint on Black-Scholes price minus midpoint, delta times lagged underlying price differences, and lagged price changes. All variables are reported in dollars and in percentages as a fraction of the midpoint.  Table 3 Summary statistics SPX option trades on S&P 500 The table shows the mean and standard deviation (in parentheses) of option trade variables. The first column reports results for the full sample, and the remaining columns those for nine subsets of the data double sorted by moneyness and time to maturity. Each subsets can be identified by the rows showing the average Moneyness and Time to Maturity. Most variables are self explanatory. The vegas are calculated with respect to the underlying volatility. The trade direction equals 1 if the trade is originated with a market buy order, and -1 if it is a market sell order. The call indicator equals 1 if the trade is in a call option, and zero if it is in a put option.

Full Sample Subsets
(1)  Order flow is defined as buyer originated minus seller originated trading volume and expressed in dollars.
The first row shows statistics on the net order flow in the SPY ETF (in $100s of millions). For the next two rows, we decompose each SPX option trade into exposure to the underlying S&P 500 (x ∆ ), based on the options delta, and exposure to the S&P 500 volatility (x ν ), based on the options vega. The order flow exposures are then aggregated over all trades across options with different strike prices, expiration dates, and puts and calls. We further report statistics on the differences in the SPY ETF price (d u ) and the S&P 500 volatility process (d v ) extracted from option prices. The volatility and the option deltas and vegas are estimated with a Heston stochastic volatility model using the cross-section of option prices.   (4). Long-run is defined as four periods (trading hours). The first column shows the standard deviation of the structural residual of each equation, which represents the size of the shocks in the IRF. The next columns show the long-run impact of a one standard deviation impulse to each of the variables. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.
Impulse   (4). The system considers five endogenous variables reported in columns (1) to (5) respectively: the net dollar option order flow exposure to the volatility component (x ν ), to the underlying component (x ∆ ), and net SPY ETF order flow (x spy ); and the difference of the volatility process VIX (d v ) and the price of the underlying measured by the SPY ETF (d u ). The ordering of the columns corresponds to the ordering of the equations in the VAR model, and the coefficients set to missing identify the structural shocks. The letter L in the independent variable names represent the lag-operator. Inference is based on Newey-West standard errors with two lags that are reported in parentheses. ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.
(1)  This table provides the expected price impact cost of the four most commonly used option portfolio strategies based on the VAR results in Table 5. Specifically, we consider a straddle, a strangle and a put and a call spread. For each strategy, we measure the long-run cumulative price impact of the individual option trades and the portfolio trades, where the price impacts are netted or amplified. Specifically, we transform the option trades to delta and vega flows (columns (6) and (7)), which affect the underlying and volatility process (columns (8) and (9)), and in turn the prices of the options (columns (10) and (11)). While most variables are self-explanatory, the variable Portfolio PI measures the price impact of the portfolio trade on the individual options, which incorporates the cross-option price impact of all option trades in the portfolio. As a benchmark, Trade PI measures the price impact of the single option trade, ignoring cross-option price impact. We calculate these results for option characteristics of ITM, ATM, and OTM call and put options of June 27, 2014 (the middle of the sample period), and take the latest trade before 12:00 pm. The options mature in 22 trading days (one month). The quantity of each option trade is a 1000 contracts, which each have a multiplier of 100. At the time, the value of the underlying was $1,954.2 and the volatility 0.088 (or 8.8%).
(1)  Table 8 The common component in the underlying and volatility process We decompose d u and d v into two components: the variation explained by each other and a residual, identified by running: We denote the predicted value of the first equation by d c,u , as it represents the common variation explained by d v ; and denote the residual ε 1,t by d r,u . The predicted value from the second equation is called d c,v and the residual d r,v . The OLS regressions below correspond to the five-equation VAR model of Table 6, but based on the common variation (columns (1) and (2)) or the residual variation (columns (4) and (5)).
Columns (3) and (6) are similar to columns (2) and (5), respectively, but do not have the contemporaneous d v term.
(1)  Table 9 Explanatory power of price and volatility of the S&P 500 on option price changes at the hourly frequency This table shows the extent to which option price changes can be explained by the two factors as proposed by regression (8). The dependent variable is the change in an option's quoted midpoint, which is regressed on the particular options delta times the changes in the underlying (∆d s ) and the options vega times the change in the volatility in (νd v ). The midpoint price and the change in the underlying and the volatility are sampled at the hourly frequency (the dataset is balanced). We add two lags to be consistent with the previous analyses. The sample uses the full sample of option data of 2014, including options with all expiration dates, strike prices and puts and calls. Column (1) shows results for the full sample, and columns (2)-(10) for various subsets of options sorted by time-to-maturity (TTM) and moneyness (see Table ( 4)).
The first rows show the average TTM and Moneyness of the particular sample. Inference is based on Newey-West standard errors with two lags. The superscripts ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.

Full Sample Subsets
(1)    This table is identical to Table 9, but uses data sampled at the one-minute frequency. We regress option price changes on the particular option's delta times the changes in the underlying (∆d s ) and the options vega times the change in the volatility (νd v ). We add ten lags of these variables. Due to the sheer size, we only use data of the first two months of 2014 rather than the whole year (results are unaffected). Column (1) shows results for the full sample, and columns (2)-(10) for various subsets of options sorted by time-to-maturity (ttm) and moneyness. The first rows show the average ttm and moneyness of the particular sample. Inference is based on Newey-West standard errors with two lags. The superscripts ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.

Full Sample Subsets
(1)  Figure 1 Impulse response functions VAR: the impact of order flows on the underlying and volatility returns In this figure, we plot the IRFs of order flow shocks to the difference of the underlying (d u ) and volatility (d v ). The d u is based on the SPY ETF quoted midpoint (in cents) and the d v is based on the VIX extracted from the cross section of option prices (in percentage points). The net order flow (buyer initiated minus seller initiated) in each option is decomposed into dollar order flow exposure to the underlying (x ∆ ) based on its delta, and volatility (x v ) based on its vega. The order flows are aggregated across options with all strike prices, expiration dates, and puts and calls. The order flow in the SPY ETF is also added in a separate equation. The IRFs are based on the VAR system of equation (4) and corresponds to the results in Table 6.
The predicted values capture the variation in d u (d v ) explained by d v (d u ), and thus the variation that reflects the leverage effect. This specification corresponds to columns (1) and (2) of Table  (5).
The residuals capture the variation in d u (d v ) not explained by d v (d u ), and thus filter out the leverage effect. This specification corresponds to columns (4) and (5) of Table (5).  Figure 1, but shows results for data sampled at the one-minute frequency. For comparison to the main specification (at the hourly level), we add ten lags in the VAR system and iterate the IRFs for 20 steps (minutes).  This table shows various quoted and trade based spreads of all SPX options in 2014. We show statistics for the full sample, and subsamples of in-the-money (|∆| ≥ 0.65), at-the-money (0.65 > |∆| ≥ 0.35) and out-of-the-money (0.35 > |∆|) with |∆| denoting the absolute value of the option delta. Panel A shows equally weighted observations and Panel B dollar volume weighted observations (price times quantity). We define the Effective quoted half-spread as the bid-ask spread prevailing just before a trade. The effective half-spread is defined as (p t − m t )d t , which is then decomposed into a realized spread (p t − m t+τ )d t and adverse selection component (m t+τ − m t )d t based on a midpoint price τ minutes later. We also report the Muravyev-Pearson corrected effective spread (MP), which replaces the quoted midpoint by the predicted value of a regression of the midpoint on Black-Scholes price minus midpoint; delta times lagged underlying price differences and lagged price changes. All variables are reported in dollars and in percentages as a fraction of the midpoint.  This table shows the cumulative IRFs for the main specification of the VAR model, and for two alternative orderings of the equations. The superscripts ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.
This table is identical to Table 5, but uses data sampled at the half-hour frequency. It shows the extent to which option price changes can be explained by the two underlying factors as proposed in Equation (6). The dependent variable is the change in an option price, which is regressed on the particular options delta times the changes in the underlying (∆d s ), and the options vega times the change in the volatility (νd v ). We add four lags to be consistent with the previous analyses. Column (1) shows results for the full sample, and columns (2)-(10) for various subsets of options sorted by time-to-maturity (TTM) and moneyness. The first rows show the average TTM and Moneyness of the particular sample. Inference is based on Newey-West standard errors with two lags. The superscripts ***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels, respectively.