Modelling Limit Order Book Volume Covariance Structures

Limit order volume data have been here analysed using key multivariate techniques: principal components, factor and discriminant analysis. The focus lies on understanding of the covariance structure of posted quantities of the asset to be potentially sold or bought at the market. Employing the methods to data of 20 blue chip companies traded at the NASDAQ stock market in June 2016, one observes that two principal components account for approximately 85–95% of order book variation. The most important factor related to order book data variation has furthermore been the demand side (variability). The order book data variation, moreover, successfully classifies stock price movements. Potential applications include improving order execution strategies, designing trading algorithms and understanding price formation.


Introduction
The limit order book (LOB) trading mechanism became the dominant way to trade assets on financial markets.Since the limit order book represents liquidity supply of assets on a market, it essentially reflects the demand for as well as the supply of assets above the equilibrium pricevolume point.Its variation is affecting the liquidity and price dynamics of an asset, and thus, the goal of this study is to conduct a comprehensive multivariate analysis of the limit order book (variation) data.
Here we model the covariance structures of order book data of several assets by employing key multivariate methods.Theodore W. Anderson synthesized various subareas of the subject and has influenced the direction of recent and current research in theoretical multivariate analysis [1].The principal components, factor and discriminant analysis remain quite popular dimension-reduction and classification techniques that are applied in many research fields.
Multivariate techniques are, for example, recently used in financial econometrics of limit order book markets.The principal component analysis is performed in the studies about commonalities in liquidity (measures), see, for example [2,3], or while analysing price impact data [4].The dynamics of liquidity supply curves is captured by the so-called dynamic semiparametric factor model in [5], whereas [6] characterize traders' behaviour using discriminant analysis.
Our focus lies on understanding of the variability of posted quantities of the asset, to be potentially sold or bought at the market.The volume (variation) at every order book level is analysed as a random variable, and thus we do not suppress the order book information through, for example, liquidity measures or reward functions.In this chapter, we consider the (full) structure of the covariance matrices.Potential applications thus include improving order execution strategies, understanding price formation and liquidity commonalities, designing trading algorithms.
This study is organised as follows: after the limit order book data have been described in Section 2, the statistical methods are presented in Section 3. Empirical results are provided in Section 4, and Section 5 concludes.

Limit order book data
The limit order book of an asset lists the volume of pending buying or selling orders at given prices for the asset under consideration and here we analyse its variance-covariance structure.At a fixed time point, the order book essentially represents a snapshot of the asset's demand and supply curves above the market equilibrium quantity level.The volume to be potentially bought forms the asset's demand (bid) side, whereas the volume to be potentially sold depicts the asset's supply (ask) side.To be more precise, the order book bid and ask curves represent liquidity supply, thus quantities above the equilibrium volume level, as orders below the equilibrium (would) have been traded at the market.

NASDAQ market data and descriptive statistics
At the NASDAQ stock market, one of the world's largest securities exchange, the orders are posted nearly instantaneously and the limit orders are executed in the received order.To visualize a limit order book, consider the data of Intel Corp. (INTC) on 30 June 2016, obtained from the data provider LOBSTER (lobsterdata.com).The number of shares to be potentially bought or sold at different prices at 10:00 and 11:00 are depicted in Figure 1.For example, at 10:00 at prices 32.14 (fifth best bid price) and 32.18 (best bid price), there are 16,834 and 2927 stocks demanded, respectively.At the same time, the number of offered shares at prices 32.19 (best ask price) and 32.23 (fifth best ask price) similarly equals 1700 and 15,355, respectively.At 11:00, one furthermore observes that the order book shifted to the direction of higher prices.We attribute this movement to the (observed) increased demand pressure.At the NASDAQ order book driven securities exchange, there are several event types that influence the bid and ask curves, namely submissions of new limit orders, cancellations, deletions and executions (lobsterdata.com).Our data set allows us thus to reconstruct all order book activities of a particular company over the course of a trading day.For a description of trading that is common to most limit order book markets, see, for example [7].
The order book volume at given price level represents here a p-dimensional random variable.
Denoted by = 1 ,…, ┬ , 1 < … < , the price and by = 1 ,…, ┬ , the associated volume vector.The limit order book of an asset is given by the pairs The expected volume vector is denoted by and the object of our interest, the limit order book volume variance-covariance matrix by (2) here Σ is a symmetric × matrix whose mean diagonal elements depict the variances of the pending volume at fixed price levels 1,…,.
Limit order book data of the 20 largest stocks traded at the NASDAQ stock market have been collected for the purpose of our analysis.In modelling of the high-dimensional covariance structures of this object, we set = 10.The volume at the demand side is thus represented by the variables 1 ,…, 5 and the variables 6 ,…, 10 form the supply regime.Since the "Brexit" referendum results had a significant influence on the stock market movements, we correspondingly focus on the order book activities on 27 June 2016 (S&P 500 at lowest level after the voting) and 30 June 2016 (upward movement of the S&P 500 series).
The number of daily order book changes varies considerably across the investigated stocks, that is, between 59,628 and 1,805,688, see Table 1.After the referendum results, there have been many order book changes present, as compared to the trading activities on 30 June 2016.For almost all stocks, the number of changes then decreased quite substantially.
The majority of the companies had on 30 June 2016 interestingly more stocks (on average) listed at the given price levels of the order book compared to that on 27 June 2016, see   ( with a × 1 vector of ones denoted by 1 .The average posted quantities moreover exhibit a symmetric pattern while comparing the estimated volume at the bid and ask sides.

Covariance structure estimation
The results above indicate that the order book change count as well as the estimated average volume vector changed (substantially) on 30 June as compared to the market situation on the 27 June 2016.Having estimated the mean vector, we are ready to focus on the (potential) changes in the variance-covariance matrices, that is, covariance structures of the order book data.The covariance matrix of the order book volume is estimated by (4) where , with identity matrix , denotes the centring matrix, and 1 represents a × 1 vector of ones [8].The empirical results are displayed in Figures 3 and 4, for the megacap and large-cap stocks, respectively.Since the analysed order book volume vector is a 10dimensional object, , the axes at every graphical display represent the index of the random variable(s) under consideration.In total, there are 100 estimated covariance values displayed at each graph, that is, all values of the 10 × 10 matrix Σ.For example, the upper left square of every graph denotes the estimated covariance between 1 and 1 (which equals the estimated variance of 1 ); the lower left square represents the estimated covariance between 1 and 10 , etc.The MATLAB function 'pcolor' has been used for generating Figures 3 and 4. The matrix values are used to define the vertex colours by scaling the values to map to the full range of the 'colourmap', see the MATLAB documentation for more details.Note that a darker (blue) colour shows a larger value of the estimated covariance between the random variables and vice versa.Our empirical results indicate several interesting findings.One observes a relatively stronger variation in the individual volume variables than the covariance levels across all stocks.We aim identifying the linear combination that is responsible for the largest proportion of the data variation.There are furthermore relatively larger covariance levels between the bid and ask sides on 30 June 2016 in comparison with the levels on 27 June 2016, indicating a stronger impact of one market side on order book variation immediately after the referendum results.Our analysis aims particularly to select the most important factor associated with this variation.

Modelling framework
Recall, we model the limit order book volume as a p-dimensional random vector and denote its expected value by , a × 1 vector, and the covariance matrix by Var = Σ, a × matrix.After observing realizations of , that is, after obtaining the × order book volume matrix , the parameters and Σ are estimated by expressions (3) and (4), respectively.Among multivariate techniques that deal with dimension reduction of high-dimensional random vectors, in volume covariance structure modelling we focus on the principal components, factor and discriminant analysis.Multivariate techniques deal with simultaneous relationship among variables and differ from univariate and bivariate analysis in that they direct attention away from the analysis of the mean and variance of single variable or from the pairwise relationship between two variables, to the analysis of the covariances and correlations among three or more variables [9].

Principal components analysis
Principal component analysis focuses on standardised principal components of a highdimensional random variable.It has been first introduced by Karl Pearson for nonstochastic variables and by Harold Hotelling for random vectors [10].The low dimensional representation enables us to study the correlation between the principal components and the original data; here our goal is to find the standardized linear combination of the order book volume vector that is associated with the largest order book variation.The technique is based on a very useful theorem [11], the spectral decomposition theorem.General results about eigenvalues and eigenvectors for square matrices and those for symmetrical matrices are provided in [12].
The standardized linear combination of a p-dimensional variable = 1 ,…, ┬ that maximizes the order book variation uses the first eigenvector associated to the first (largest) eigenvalue of the spectral decomposition Σ = ┬ , = diag 1 ,…, , 1 ≤ … ≤ being the × diagonal matrix of eigenvalues and the × matrix of associated eigenvectors.The second largest variance proportion is explained by the linear combination using the second eigenvector, etc.The principal components are given by Y = ┬ − .In modelling order book data, we estimate the principal components by (5) with the estimated matrix of eigenvectors from the spectral decomposition of Σ = ┬ , and the estimated × dimensional diagonal matrix of eigenvalues .For illustrative purpose, it often suffices to consider only the first two principal components, that is, the first two columns of the × matrix .

Factor analysis
In factor analysis the random vector is modelled as a linear combination of few common factors.The concept of latent factors seems to have been suggested by Francis Galton, the formulation and early development of factor analysis have their genesis in psychology and are generally attributed to Charles Edward Spearman [10].Factor analysis aims to discover independent variables that describe the variation of a high-dimensional random variable with high explanatory power [13].Formally, we consider a k-factor model where and denote the and dimensional common and specific factors, respectively [8].
The × matrix of factor loadings is denoted by .It is furthermore assumed that = 0 , The associated factor loadings represent the combinations which reflect the common variance part and the remaining variation is quantified through the covariance matrix of the specific factors.In practice, we are consequently interested in estimating the matrix of common factor loadings and the covariance matrix of the specific factor Ω.Here we utilise the maximum likelihood method: while assuming that the volume is multivariate normally distributed [8], the estimates are given by maximising the log-likelihood function, namely where denotes the sample size and Σ the estimated covariance matrix, see Eq. ( 4).

Discriminant analysis
In discriminant analysis, multivariate data observations are classified into two or more known groups.A modern treatment of discriminant analysis and a brief history of discriminant analysis is included in [10].In the analysis of group differences [13], for example, state two questions: (i) does there exists a significant difference between the groups (variation) and (ii) which variables are responsible in this aspect?In practice, a discriminant rule is used to classify existing and new observations and the number of correctly classified observations reflects the quality of the approach.Here we are interested in the classification accuracy: to which extend a price change can be expected (or not) at each order book entry based exclusively on observed volume data.
The linear Fisher's discriminant rule is based on a linear combination of data, say , with denoting a × 1 vector, and the idea is to find that achieves a good separation [8,14].When the method is applied to two groups, one assumes that the data matrix is split into two groups, say 1 and 2 .Denote the sample sizes of these matrices by 1 and 2 , the estimated mean vectors by 1 and 2 , the estimated covariance matrices by Σ 1 and Σ 2 , and the centering matrices by ℋ 1 and ℋ 2 .The linear combination that maximizes the ratio of the between-groupsum of squares to the within-group-sum of squares is given by $ ( ) where the × matrix is related to the between-group-sum of squares as follows:

Empirical results
An analysis of principal components often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily results [15].Consider, for example, the proportion of order book variance explained by two principal components in Table 2. Two principal components are sufficient to describe the order book variation, since the explained proportions range between 0.81 and 0.96 (27 June 2016) and 0.78 and 0.97 (30 June 2016).
The limit order book variation of most companies is clearly stronger explained on 30 June 2016 as compared to the resulting explanatory power on 27 June 2016.The largest explained proportion increase is evident for smaller stocks, especially for SBUX, CELG, QCOM, COST and PCLN.Looking only at the descriptive results reported in Table 1 one would conclude that the number of changes is apparently similar across all stocks.Now it is evident that the demand and supply curves of smaller stocks change relatively stronger during turbulent times (here during a downward price movement).We attribute this to the relatively lower liquidity of large-cap stocks as compared to the highly liquid mega-cap stocks.Factor analysis can be considered as an extension of principal component analysis, although both techniques can be viewed as attempts to approximate the covariance matrix; however, the approximation based on the factor analysis model is more elaborate [15].In the sequel, we chose a = 1 factor model since we are interested in selecting the driving factor of order book variation.The results are depicted in Tables 3 and 4 for the mega-cap and large-cap companies, respectively, based on the estimated values of the factor loadings .The empirical findings suggest that limit order book volume data successfully classify price changes, especially on 30 June 2016, a day with relatively low number of order book entries.
Here the first group contains entries with mid-quote price 5 + 6 /2 change, and the second group entries without a change.Our results show that the classification rates changed positively quite significantly for extremely large and the smallest investigated stocks.The later ones, as discussed above, exhibit a relatively well understood covariance structure on 30 June 2016.

Conclusions
Limit order book data of 20 highly traded stocks at the NASDAQ market in June 2016 have been analysed.We select 2 days after the 'Britex' referendum, namely, 27 June (lowest S&P 500 level) and 30 June (recovery day).The variable of interest is the 10-dimensional order book volume data vector, that is, quantities pending at the five best levels of the demand side and at the five best supply side levels.
Two principal components account for approximately 85-95% of the order book data variation.
The results of a one-factor model identify the demand (variation) as the most important factor explaining the order book covariance structure.The limit order book volume data variation is quite informative in predicting the price evolution (change or no change in the mid-quote) across all stocks and during the analysed trading activities.The mega-cap and the smallest investigated large-cap companies share almost the same classification performance.Finally, multivariate statistical techniques are successfully employed in covariance modelling of order book data.

Figure 2 .
For convenience, denote the observed × volume data matrix by .Here the expected value of is estimated by

Figure 2 .
Figure 2.Estimated average volume of the order book data for selected stocks on 27 June 2016 (solid) and 30 June 2016 (dashed).

Figure 3 .
Figure 3.Estimated covariance structure of order book data: mega-cap stocks on 27 June 2016 (upper panel for each stock) and 30 June 2016 (lower panel for each stock).

Figure 4 .
Figure 4.Estimated covariance structures of order book data: large-cap stocks on 27 June 2016 (upper panel for each stock) and 30 June 2016 (lower panel for each stock).

Table 1 .
Number of limit order book observations on 27 and 30 June 2016 and the change (decrease) in % for the largest 20 stocks at NASDAQ.

Table 2 .
Estimated proportion of explained order book volume variance by the first two principal components.

Table 6 .
Estimated proportion of correctly classified price changes based on volume data for investigated large-cap stocks.