Learning Financial Networks with High-frequency Trade Data

Financial networks are typically estimated by applying standard time series analyses to price-based economic variables collected at low-frequency (e.g., daily or monthly stock returns or realized volatility). These networks are used for risk monitoring and for studying information flows in financial markets. High-frequency intraday trade data sets may provide additional insights into network linkages by leveraging high-resolution information. However, such data sets pose significant modeling challenges due to their asynchronous nature, nonlinear dynamics, and nonstationarity. To tackle these challenges, we estimate financial networks using random forests. The edges in our network are determined by using microstructure measures of one firm to forecast the sign of the change in a market measure (either realized volatility or returns kurtosis) of another firm. We first investigate the evolution of network connectivity in the period leading up to the U.S. financial crisis of 2007-09. We find that the networks have the highest density in 2007, with high degree connectivity associated with Lehman Brothers in 2006. A second analysis into the nature of linkages among firms suggests that larger firms tend to offer better predictive power than smaller firms, a finding qualitatively consistent with prior works in the market microstructure literature.


Introduction
From both theoretical and practical perspectives, there is interest in estimating linkages among financial institutions using data.Academicians seek to understand how information flows between firms, regulators aim to identify when and how risk spreads through the financial system, and financiers would like to know whether incorporating other firms' characteristics can improve their own trading algorithms.The matter of how firms interact with one another can be represented mathematically as a network, with nodes corresponding to financial institutions and an edge between two nodes indicating that those firms are connected in some sense.
From a data scientist's perspective, there are two key questions as to how best to measure these connections.First, it is important to understand which statistical methods are well-suited for the task of estimating linkages.Second, one needs to decide what type of data to apply these methods on.Financial institutions generate a variety of data through their activities (e.g. trading volumes and stock prices) and analyses consider only price data; for instance, the stock returns used in Billio et al. (2012) and realized volatility used in Diebold and Yılmaz (2014) can be calculated from stock prices.However, trades are the truly fundamental object since they are what give rise to prices.Thus, by using microstructure measures, which are formed from trade information, we address whether there is information contained in trades -beyond what is reflected in prices -that is useful in understanding inter-firm connections.Easley et al. (2019) provides empirical evidence that random forests can predict market measures of futures contracts using those contracts' microstructure variables.Their analysis focuses mainly on intra-firm prediction; that is, they consider whether contract A's microstructure variables can predict contract A's market measures, and similarly for contracts B, C, etc.In our work, on the other hand, we ask whether features of firm A can help predict the market measures of firm B, for all pairs, A and B, in the system.
Put differently, we measure whether, and to what extent, firm A's predictability increases when we include firm B's features in the random forest, as compared to when only firm A's features are used.Various metrics exist to quantify predictability, including accuracy, precision, recall, and the F1 score.We use the area under the ROC curve (AUC), which reflects the true and false positive rates as the decision threshold is varied.We apply a bootstrap procedure to test by how much (if at all) the AUC increases when we add firm B's features.The increase is then used as a weight for the edge running from firm B to firm A. In this manner, we construct a network whose edges indicate cross-predictability between firms.This technique can be viewed as a high-frequency analogue to the Granger causality methods applied to monthly stock returns in Billio et al. (2012).Under that framework, an edge from firm B to firm A means that firm B's lagged returns help predict firm A's returns, over and above firm A's own lagged returns.Edges are defined similarly here, except that instead of using linear models on monthly stock returns, we apply random forest methods to intraday data.In both cases, we assess whether another firm's information boosts predictive power.We note that random forests may be a particularly effective method for capturing cross-effects between firms since they allow for higher-order interactions between many features, which is especially important given the complexities of modern-day financial markets.
We apply our methodology to high-frequency trade data of U.S. banks, brokerdealers, and insurance companies, with the goal of better understanding cross-effects between these institutions.Our methods can be used to address the same questions that researchers ask in the low-frequency context, including how network connectivity changes over time and what information channels exist between firms.On the first count, we apply our methods to intraday data spanning 1998 to 2010, thereby visualizing the historical evolution of network connectivity over both economically stable and crisis periods.We find that the networks reach maximum density in late 2007, following the collapse of two subprime mortgage funds associated with the investment bank Bear Stearns.Several of the most highly connected nodes in the network, including Lehman Brothers and AIG, have been recognized as key contributors to the U.S. financial crisis.Second, we demonstrate how our methods can be used to detect possible information spillovers between small and large financial firms.This line of analysis is motivated by Chordia et al. (2011), which provides empirical evidence that the returns of large stocks lead (in the Granger causal sense) the returns of small stocks, and that this lead-lag relationship is especially strong when the large stocks have low liquidity.Our results are consistent with this earlier analy-sis: we find that the microstructure variables of large firms tend to be more important (compared to the microstructure variables of small firms) in predicting market measures for both small and large firms.
The manuscript is organized as follows.In Section 2, we describe our methodology, including an overview of how the data is structured, an explanation of how random forests work, and details of our bootstrap AUC procedure, which is used to assess whether cross-features provide predictive improvement.In Section 3, we define five microstructure variables that are used as features in the random forest, while Section 4 presents two market measures that serve as labels (variables that we predict).In Section 5, we describe the high-frequency data used in our analysis, including which firms we choose to focus on.Section 6 presents the results of two empirical analyses, namely the evolution of network connectivity over time and the presence of information flows between small and large firms.Section 7 concludes.

Methods
In this section, we first describe how our data is structured, namely how we sample from high-frequency trade information to construct microstructure variables and market measures.We then outline our random forest methodology, including an overview of how random forests work and details of our training and testing procedure.Lastly, we introduce two metrics used to interpret the random forest results; the first measures the relative importance of features used in the model, while the second quantifies the random forest's predictive accuracy.By comparing the accuracy with and without cross-features, we create financial networks whose edges indicate that one firm's variables significantly improve our ability to predict the other firm's market measure.Details are provided below.

Data Structure
Before describing our statistical methods, we briefly explain how our dataset is structured.We begin by obtaining high-frequency trade data from the NYSE Trade and Quote (TAQ) database [NYSE Trade and Quote Database].TAQ provides information on every trade that occurs on a U.S.-based exchange, including the NYSE, Nasdaq, National Stock Exchange, and others.Among the many variables returned by TAQ are the timestamp, price, and volume of each trade.These three variables are integral to creating our final dataset: timestamps are used to aggregate trades (thereby reducing the total number of observations in our dataset), while price and volume are used to create the microstructure variables and market measures that serve as features and labels in our random forest.

Trade Aggregation
Aggregating trades is common in high-frequency financial data analysis, for several reasons: aggregation limits the effect of noise, reduces the amount of data that we need to process, and allows for the creation of economically meaningful variables [Hautsch (2012)].Trade aggregation can be based on time (e.g., collecting all trades whose timestamps fall in a 30-minute interval) or on event (e.g., aggregating trades until the price change exceeds a given threshold).In our analysis, we use time-based aggregation, grouping each firm's trades into 30-minute time bars. 1 Since we consider only trades that occur during regular market hours (9:30 AM EST to 4:00 PM EST), our time bars correspond to the intervals 9:30 AM to 10:00 AM, 10:00 AM to 10:30 AM, and so on and so forth until 3:30 PM to 4:00 PM, with these bars repeated for each day of the sample period.We emphasize that time bars are formed on a firmby-firm basis; that is, we do not combine trades of stocks x and y into a single bar.

Microstructure and Market Variables, Lookback Windows, and Forecast Horizons
Once a firm's trades have been gathered into time bars, we construct a set of microstructure variables and market measures that capture key properties of the firm's trading.In Sections 3 and 4, we provide definitions of these variables and measures.For now, we note that microstructure variables are used as features (predictors) in our random forest, while market measures are used to calculate labels (quantities we predict).All are based on sequences of trade prices and volumes, and all are computed -at each time bar -using a lookback window of size W.For instance, the value of Kyle's lambda (one of the microstructure variables) at time bar t is based on the trade prices and volumes at time bars in {t, t − 1, ..., t − W + 1}.
The microstructure variables at time bar t are then used to predict the sign of the change in a market measure at time bar t + h, where h is a fixed forecast horizon.For example, one of the market measures we consider is realized volatility.We do not predict the value of realized volatility at bar t + h, nor do we predict the change in realized volatility between bars t and t + h.Instead we predict whether this change is positive (realized volatility increases) or negative (realized volatility decreases).The sign of the change in realized volatility becomes the label for our random forest; thus we are predicting a binary variable that takes the value 1 if the market measure increases and -1 if it decreases.
In our analysis, we set W = 50 and h = 50.Since each time bar represents a 30minute interval and there are 12 such intervals during regular market hours2 , our lookback window size and forecast horizon both correspond to slightly more than four trading days.

Random Forest
Random forests are a popular machine learning tool for predicting the values of a binary variable [Breiman (2001)], Friedman et al. (2001)].In our work, this binary variable represents whether a market measure -such as realized volatility -decreases (-1) or increases (1) over some fixed forecast horizon.Random forests work by aggregating the predictions of many decision trees, so we begin by describing how each tree makes its prediction.
A decision tree takes as input training data in the form {(x i , y i )} 1≤i≤n , where y i is the label for observation i and x i = (x i1 , ..., x ip ) is the vector of features.The tree repeat- 1 One alternate sampling method is to collect trades until their cumulative dollar-volume reaches a certain level [Easley et al. (2021)].So-called dollar-volume bars have appealing theoretical and practical properties; however, they are not synchronized across stocks and thus present challenges for how to model using cross-effects.For example, an actively traded stock, s A , fills its dollar-volume bars faster than a less actively traded stock, s I .Thus s A 's first dollar-volume bar may run from 9:30 AM to 9:35 AM, while s I does not fill its bar until 10:00 AM.Therefore, we cannot use dollar-volume bars if we hope to use s I 's features to make predictions about s A : in effect, we would be using future information about s I to predict current properties of s A .
edly splits the n training observations into two subsets on the basis of one of the p features.For example, the first split might separate the training set based on whether the second feature is greater than 5.This would yield two subsets, { j : x j2 ≤ 5} and { j : x j2 > 5}.The next split could be based on whether the third feature is greater than 10, yielding 4 subsets, and so on and so forth.In this example, features 2 and 3 are referred to as split features, while 5 and 10 are split points.Decision trees choose split features and split points by maximizing information gain, which measures how pure the labels are in the subsets that result from the split.Maximum purity (information gain) is achieved when one subset contains only observations with label 1 and the other subset contains only observations with label -1.As we move further down the tree, generating more and more splits, the feature space becomes increasingly partitioned and there exist fewer observations in each node of the tree.Eventually the tree stops growing (according to a particular stopping criterion) and we classify each observation by considering the terminal node (aka leaf) to which that observation belongs.Specifically, each observation is predicted to have the most commonly occurring label (-1 or 1) in its leaf.Decision trees are known to have low bias and high variance [Friedman et al. (2001)].They are accurate, on average, but individual decision trees are prone to overfitting the training data and sometimes do not perform well when generalized to a test set.Random forests counteract overfitting by aggregating the predictions of many decision trees, thereby stabilizing the overall prediction [Breiman (2001)].In particular, for each observation i, the random forest computes the fraction of trees that predict -1 vs. 1.We can then make a prediction for observation i based on which class has the higher probability.For example, suppose a random forest consists of 100 trees, 55 of which predict -1 and 45 of which predict 1, yielding class probabilities of 0.55 and 0.45.Then we can set our final prediction to be -1 since it is the majority vote over all trees.Each decision tree in the forest is trained on a bootstrapped sample; that is, we draw K samples with replacement from our training set and fit K decision trees, one on each bootstrap sample.
Lastly, an important aspect of random forests is that not all features are taken to be candidates for every split.Instead -at each split -we choose a random subset of features, compute the largest information gain we can achieve with each of these features (over all split points), and select as our final split feature and split point the ones that offer maximal information gain.This procedure is particularly helpful when there are correlated features, in which case decision trees may select the feature that offers marginally higher information gain, while ignoring its highly correlated but slightly less predictive counterpart.By randomizing the split candidates, we ensure that each of these feature has an equal opportunity of being selected.

Random Forest Parameters
We implement our random forest using the randomForest package in R [Liaw and Wiener (2002)].In particular, we produce forests with K = 1000 trees, each of which is allowed to grow without limit (i.e., the minimum leaf size is 1).We randomly select m features as candidates at each split, with m = √ p .Recall that p denotes the number of features, which varies based on whether we include cross-effects.We use five microstructure variables (see Section 3), so m is either √ 5 if we use only the firm's own variables for prediction or √ 10 if we also consider the features of one other firm.Finally, we assign a weight to each training set observation on the basis of its class; observations with label 1 (resp.-1) have weight 1/n I (resp.1/n D ), where n I and n D denote the number of training observations with label 1 and -1, respectively.When performing bootstrap, training observations are randomly sampled according to these weights so that the effect of class imbalance is minimized.

Purged Cross-Validation
Once the random forest is fit on the training set, we evaluate its performance on test data.Depending on the exact analysis we perform (see Section 6 for details), we use one of two approaches.The first procedure is purged cross-validation as proposed in Easley et al. (2021).This involves splitting the sample period into G intervals of equal length.We then iterate over each interval, g, taking g to be the test set and using all other intervals as training data, with one caveat.Since our microstructure variables (features) and market measures (labels) are formed using a lookback window, the train and test sets under this approach are not independent of each other, introducing bias into our results.To correct for this, we purge five days worth of data from around each test set, g (see Figure 1).This procedure yields G sets of results, one for each interval.In Section 2.3, we discuss how to aggregate these results across test sets.Purged cross-validation allows us to test on the entire dataset as we iterate over intervals; however, it has the disadvantage that sometimes we test our model on data that occurs prior to our training data.(This would not be the approach of, say, a practitioner applying a random forest to recent financial data in order to forecast changes in market measures.)To ensure that the chronology of the train and test sets does not impact our final results, we use an alternative approach for some of our analyses.This consists of splitting the sample period into two intervals, training on the earlier interval (which has some data purged from it) and testing on the later interval.

Evaluating the Random Forest
After applying our random forest to the test sets, we consider two aspects of our model's performance: (i) its predictive ability, i.e., how well the random forest classifies observations in the test set, and (ii) which features are most important in making those predictions.We address each of these points in turn.

AUC for Assessing Prediction Accuracy and Forming Networks
The receiver operating characteristics (ROC) curve offers a visual medium by which we can assess the predictive performance of a binary classifier such as a random forest [Fawcett (2006)].For each observation in the test set, the random forest provides the probability that the observation's label is -1, from which we can readily compute the probability that the observation's label is 1.We convert these probabilities to actual predictions of -1 or 1 by setting a decision threshold and evaluating whether the observation's probability (e.g., of being -1) meets this threshold.For instance, if we set the decision threshold to 0.5, then observations are classified according to whether the majority of trees in the random forest predict -1 or 1 for the observation in question.
The ROC curve displays the tradeoff between the true positive rate and the false positive rate as we vary the decision threshold between 0 and 1.The true positive rate (also referred to as recall) is defined as where T P (resp., FN) is the number of true positives (resp., false negatives) produced by the classifier at a set threshold.In our analysis, we take labels of 1 to be positives and -1, negatives.From equation (1), we can see that the true positive rate is simply the proportion of positives in our system that are correctly classified as such.Similarly, the false positive rate is given by where FP (resp., T N) is the number of false positives (resp., true negatives) produced by the classifier at a set threshold.The false positive rate is then the proportion of negatives in our system that are incorrectly classified as positives.Both the TPR and FPR can be computed given the predicted class probabilities and true labels for the test set.Recall that, for purged cross-validation, we use multiple test sets; however, we simply aggregate the predicted and true values across all intervals, yielding -in effect -a single set of test results.
As we vary the decision threshold from 0 (all observations classified as positive) to 1 (all observations classified as negative), both the TPR and the FPR decrease from 1 to 0. The ROC curve plots the TPR and FPR at each of these intervening thresholds (see Figure 2).A random classifier (i.e., one which -for each observation -predicts -1 or 1 with equal probability) yields a diagonal ROC curve running from (0, 0) to (1, 1), while a classifier that perfectly separates negatives from positives has an ROC curve running from (0, 0) up to (0, 1) and across to (1, 1).Thus, we can quantify a classifier's performance by computing the area under the ROC curve, referred to as the AUC.In the case of a random classifier, the AUC is 0.5, while a perfect classifier has an AUC of 1. AUC has the advantage that it assesses a classification model's performance over all possible decision thresholds, without requiring us to set a single threshold.We use AUC to detect the presence of cross-effects between firms; that is, to assess whether microstructure variables of firm x are useful in predicting market measures of firm y.We take the view that there are two competing models: Model 1 is a random forest not containing any cross-features (firm y's variables only), while Model 2 is a random forest that does contain cross-features (both firm x and firm y's variables).If features from firm y have predictive power, then Model 2 should have a higher AUC than Model 1.Thus, to determine whether cross-effects exist, we test the following hypotheses:

Example of an ROC Curve
where AUC denotes the AUC of Model , with = 1, 2.3 The test in (3) is executed according to the following steps: (1) We fit Models 1 and 2 on the training data, apply the fitted models to the test data, and store P 1 i i , the predicted class probabilities from Model 1; P 2 i i , the predicted class probabilities from Model 2; and the true test set labels, {R i } i .Here i indexes observations in the test set.
(2) Using the predictions and true values, we compute AUC O 1 and AUC O 2 , the areas under the curve for Models 1 and 2, respectively.
(3) We then draw B bootstrap samples from P 1 i , P 2 i , R i i .For each bootstrap sample b, with b = 1, ..., B, we calculate new areas under the curve, AUC b 1 and AUC b 2 , storing the difference, AUC b 2 − AUC b 1 .(4) The standard deviation, s, of the bootstrap differences is computed and a test statistic, D, is calculated as (5) Finally, a one-sided p-value is computed under the assumption that D follows a normal distribution.4 Steps 1-5 are repeated twice for each pair of firms, (x, y), in the system, once to make predictions for firm x and again to make predictions for firm y.This yields a set of 2× N 2 p-values, where N is the number of firms under consideration.We apply a multiple testing correction to control the false discovery rate [Benjamini and Hochberg (1995)] and form directed networks with edges between pairs of firms that have an adjusted p-value falling below some threshold.

MDA for Feature Importances
Area under the curve measures the random forest's predictive performance; however, we are also interested in knowing to what extent the various features contribute to these predictions.We quantify feature importances using the mean decrease in accuracy (MDA), which compares the random forest's accuracy on the original data to its accuracy on a dataset for which the values of a feature have been randomly permuted [Biau and Scornet (2016)].Accuracy is defined as the fraction of all test set observations that are classified correctly5 : For each feature f , we compute its MDA as follows: (1) We begin by fitting a model to the training set and computing its accuracy, A O , on the test set.(2) Next, we randomly permute the values of f in the test set.We make predictions on this shuffled test set and compute the new accuracy, A P .(3) The MDA for feature f is the fraction by which the model's test set accuracy decreases after shuffling f : Features having a high MDA are considered to be more important since they have a large effect on the model's accuracy.In our analysis, we compute the MDAs separately for each test set in the sample period.The MDA values are then averaged over test sets to yield a mean importance for each feature f .

Market Microstructure Variables
Our random forest model uses a variety of market microstructure variables as features.Microstructure variables are designed to measure illiquidity, volatility, order imbalance, and other consequences of market frictions.As in Easley et al. (2021), we focus on five such measures that represent the evolution of microstructure models from those that use price data alone (first generation) to those that use both price and volume data (second generation) to those that use more extensive trade information (third generation).Most of these measures were designed before the advent of high-frequency trading, raising the question of how well they capture market frictions in our current, more complex financial era.Thus our model helps to assess the ongoing utility of these traditional market microstructure variables.In what follows, we describe each of the five measures, including their importance and how they are computed.

Roll Measure
The Roll measure -a first generation microstructure variable -uses sequences of price changes to estimate the effective bid-ask spread, which in turn is a proxy for the transaction cost [Hautsch (2012)].The Roll measure at bar t, written R t , is a function of the first-order serial covariance of price changes: Here ∆p t = (∆p t−W , ∆p t−W+1 , ..., ∆p t−1 , ∆p t ), with ∆p t denoting the difference between the closing prices at bars t and t − 1.

Roll Impact
Roll impact, a second generation variable, is closely related to the Roll measure.Specifically, Roll impact is defined as the Roll measure, scaled by the amount of dollar-volume traded over the bar: where T (t) is the set of trades belonging to bar t, and p k and v k are the price and volume, respectively, of trade k.Since the numerator, R t , represents transaction cost, Roll impact can be interpreted as the transaction cost per unit of trade.

Kyle's Lambda
Kyle's lambda at bar t is given by where p t is the closing price of bar t, V t is the total volume traded over bar t, and b t = sign(p t − p t−1 ).Kyle's lambda is the coefficient obtained by regressing price change on order flow, and thus measures the price impact of trading.

Amihud's Lambda
Amihud's lambda, another second generation variable, measures illiquidity by computing the ratio of the price change to the amount traded.Thus Amihud's lambda can be viewed as the "price change per trade size," with less liquid assets having a larger per-unit price impact than their more liquid counterparts [Hautsch (2012)].In particular, Amihud's lambda at bar t is defined as where r t is the return over bar t.

VPIN
The volume-synchronized probability of informed trading (VPIN) arises from third generation market microstructure models.By comparing the amount of buyer-and seller-initiated trades, VPIN quantifies the extent to which there is information asymmetry in the market.For example, if a group of traders knows that an asset's price is about to rise, we may observe a preponderance of buyer-initiated trades as informed traders rush to secure the asset before its price increases.The VPIN at bar t is given by where V t is the total volume traded over bar t, V B t is the estimated total buy volume over bar t, and VS t = V t − V B t is the estimated total sell volume over bar t.Importantly, the information provided in the Trade and Quote (TAQ) database does not include whether the trades were buyer-or seller-initiated (we call such trades "unsigned").Thus, before computing the VPIN, we must first classify trades as buys or sells.A number of methods exist for this purpose (e.g, the Lee-Ready algorithm and the tick rule); here we use bulk volume classification (BVC), which has been demonstrated to outperform other techniques when the trade data is noisy [Easley et al. (2016)].

Bulk Volume Classification
Bulk volume classification is based on the heuristic that, if a trade is buyer-initiated, it will take place at the ask (the lowest price offered by sellers) and therefore will generate an uptick in the price of the asset.Similarly, if the trade is seller-initiated, it will take place at the bid (the highest price offered by buyers) and therefore will produce a downtick in price.This idea suggests that we can determine the amount of buyer-(resp., seller-) initiated trades by considering whether the price of the asset goes up or down.More specifically, let V t be the total volume traded over bar t, with ∆p t = p t − p t−1 denoting the change in the closing price between bars t and t − 1.Then BVC estimates the volume of buyer-initiated trades over bar t to be where σ ∆p t is the empirical standard deviation of the price changes (over all bars) and Φ is the cumulative distribution function of a standard normal random variable.
Notice that the more positive the scaled price change is, the closer Φ ∆p t σ ∆p t is to 1, so that most of the volume traded over bar t is classified as buyer-initiated.Similarly, the more negative the scaled price change, the more volume is classified as sellerinitiated.This result comports with the heuristic we described above: buyer-initiated trades are more likely to produce positive price changes, while seller-initiated trades are more likely to generate negative price changes.

Market Measures
We use the above-described microstructure variables as features in our random forest, with the aim of predicting several important market measures.Although there are a number of market measures that interest traders, regulators, and researchers, here we focus on two: the sign of the change in realized volatility and the sign of the change in the kurtosis of returns.We describe each in turn, explaining why they are of interest and how we compute them.

Sign of the Change in Realized Volatility
Realized volatility is measured by the empirical standard deviation of returns; that is, if r t denotes the return over bar t, then the realized volatility, σ t , is given by σ t = sd (r t−W+1 , r t−W+2 , ..., r t−1 , r t ).The sign of the change in realized volatility is defined as which is 1 when the realized volatility increases (over a forecast horizon of h bars) and -1 when the realized volatility decreases.A trader who predicts that volatility will rise may want to adjust their execution algorithm, increasing their trading activity so that orders are completed before prices begin to fluctuate [Easley et al. (2021)].

Sign of the Change in the Kurtosis of Returns
Many standard risk models assume normally distributed returns; thus, traders are interested in forecasting any deviations from normality so that they can adapt their risk management practices accordingly.One such deviation could be an increase or decrease in the kurtosis ("tailedness") of the returns.For example, high forecasted kurtosis could be caused by a drop in liquidity: with fewer orders on the book, trades are executed at more extreme prices, thereby generating more extreme returns [Easley et al. (2021)].The (excess) kurtosis6 at time t is given by where µ t,4 and σ t are, respectively, the empirical fourth moment and standard deviation of (r t−W+1 , r t−W+2 , ..., r t−1 , r t ).The sign of the change in kurtosis is then

Data Description
We TAQ includes trade and quote information for all stocks that are actively traded on a U.S.-based exchange; however, we focus our attention on firms from the financial sector, specifically banks, primary broker-dealers, and insurance companies.In so doing, we are able to compare our results to the analyses in Karpman et al. (2022), where lower-frequency data (monthly returns) are used to construct financial networks on the same set of firms.As in Karpman et al. (2022), sectoral membership of firms is identified using the Standard Industrial Classification (SIC) code.We analyze data for this set of firms over two time periods: 1998-2010, and 2018 (see Sections 6.1 and 6.2, respectively).Starting with the full set of trades for these firms, we apply the following filters to compile our final dataset: (i) remove any trades whose price or volume is negative since these records are clearly erroneous, (ii) exclude trades occurring outside of regular market hours (9:30 AM to 4:00 PM EST), (iii) only retain trades of common shares7 , and (iv) remove trades that are corrected, changed, or marked as erroneous8 .For each stock, we form time series of each of the microstructure variables and market measures by aggregating trades into 30-minute time bars (see Section 2.1.1).Lastly, since the market opening is run according to a different process, namely, an auction, we remove the first bar of each day from our final dataset.

Results
Having discussed the methods by which we construct high-frequency financial networks, we now demonstrate how such networks can be used to gain insight into the structure of the financial system using historical trade data.We consider two examples.The first examines how inter-firm connections vary over the course of 1998 to 2010, with a special focus on whether connectivity changes in and around financial crises (see Section 6.1).The second example explores why edges appear between certain pairs of firms, and -in particular -whether the sizes of the firms (measured via market capitalization) plays a role (see Section 6.2).

Historical Evolution of High-Frequency Financial Network Connectivity
Connections between financial institutions create channels through which risk can spread; hence, firm interconnectedness is considered to be a major contributor to systemic risk, defined as the risk of widespread failure of the financial system.For example, if a highly connected firm fails (even if due to an idiosyncratic shock), it may trigger a cascade of other firm failures that could cause extensive damage to the wider system.In the years since the 2007-2009 U.S. Financial Crisis, there has been increasing interest in measuring systemic risk and in identifying systemically important financial institutions (SIFIs).
Since systemic risk is tied to firm interconnectedness, much of the recent literature has explored how to use financial data (e.g., balance sheet information, returns, volatilities) to learn networks of firms.For instance, Billio et al. (2012) and Basu et al. (2019) construct networks whose edges correspond to intertemporal correlations (Granger causality) between firms' stock returns.Under the market efficiency hypothesis, there should not exist such lead-lag relationships between the price changes of different firms; however, in practice, market frictions such as capital requirements, borrowing constraints, and transaction costs may indeed give rise to correlations.As argued in Billio et al. (2012), the more such correlations exist (and the larger these correlations are), the greater the chance of risk propagating from one firm to another (i.e., the more systemic risk there is).Billio et al. (2012) shows that there are an increasing number of Granger causal connections during the economically unstable periods of 1998-1999and 2007-2008. Likewise, Basu et al. (2019) (which refines the methods in Billio et al. (2012)) demonstrates that network connectivity spikes around several recent systemic events, including the 1998 Russian financial crisis and the 2008 collapse of the investment bank, Lehman Brothers.Karpman et al. (2022) expands on these methods further by constructing networks via quantile Granger causality, which focuses on firm connections that exist specifically during market downturns.Each of the aforementioned studies uses monthly stock returns for network building.
Thus far we are unaware of any studies that attempt to quantify systemic risk using high-frequency financial networks.The methods proposed in this paper, however, are a natural vehicle for doing so.We have described how microstructure variables, computed from intraday trade data, can be used to predict future changes in market measures such as realized volatility.Since these microstructure measures reflect information-based trading, a firm, y, having microstructure measures that can help predict realized volatility of another firm, x, represents a possible source of risk to firm x.Thus, by assessing whether features from one firm are useful in forecast-ing changes for another firm, we can construct a network whose edges represent a high-frequency analogue of returns-based Granger causality.In this section, we create such networks for a set of firms and over a given time period that are comparable to those considered in Billio et al. (2012), Basu et al. (2019), and Karpman et al. (2022).We begin with the details of our network construction process, and then compare our results to those obtained using bivariate Granger causality applied to monthly stock returns.

Methodology for Constructing 1998-2010 Financial Networks
For each year between 1998 and 2010, we rank all actively-traded firms according to their average monthly market capitalization, which is computed using data from the Center for Research in Security Prices (CRSP) database, accessed via WRDS [CRSP Stocks].Using this ranking, for each year, we identify the top 25 banks, primary broker-dealers, and insurance companies, yielding a total of 75 firms.9Any firms having insufficient data are excluded from our analysis, resulting in some variation in the number of firms considered per year (ranging from 59 firms in 1998 to 75 firms in the later years of the sample period). 10ext we divide each year into three overlapping 6-month periods: January 1 through June 30, April 1 through September 30, and July 1 through December 31.Thus our analysis involves 39 time windows (13 years, with 3 windows per year).For each six-month window, we split the interval into two sets, training on the first three months and testing on the last three months.For example, we train on data from approximately 11 January 1, 1998 through March 31, 1998 and test on data from approximately April 1, 1998 through June 30, 1998.We implement this testing procedure, rather than purged cross-validation, so as not to introduce any bias that may result from training on data that occurs after the test data.
In each window, we iterate over each pair of firms, (x, y), twice, once to predict the sign of the change in realized volatility for firm x, and a second time for firm y.We fit two random forest models, one that includes only features of the firm for which we are forecasting and the other that includes cross-features (i.e., features of both x and y).Then, as described in Section 2.3, we use bootstrap to assess whether the area under the curve (AUC) increases significantly under the inclusion of cross-features.Our bootstrap procedure yields a set of p-values, one for each possible edge, x → y, in the network.We apply a false discovery rate correction and retain the set of edges whose adjusted p-value is less than or equal to 0.05.

Estimated 1998-2010 Financial Networks
Figure 3 displays the proportion of realized edges, hereafter referred to as density, 12  in each of our estimated networks from 1998 to 2010.For comparison purposes, we also show the density of networks estimated using bivariate Granger causality on monthly stock returns; however, we caution the reader that the high-and lowfrequency networks are computed over different time windows, hence the two time series of network density are of different lengths.
Our first observation is that the high-frequency network density increases steadily during 1998, reaching a peak in the last quarter of that year (i.e., when our model is applied to test data from October-December 1998).This increase in connectivity coincides with a period of mounting economic turmoil in Russia, culminating with the Russian government devaluing the ruble, defaulting on domestic debt, and declaring a moratorium on repayment of foreign debt (August 17, 1998) [Chiodo and Owyang (2002)].As the future of the Russian economy remained unclear, U.S. stocks plunged and the Federal Reserve Bank of New York was forced to organize a bailout of the U.S.-based hedge fund Long Term Capital Management [Rubin et al. (1999)].Notice that low-frequency (monthly returns) networks also display a connectivity increase during the fall of 1998.
Our We now turn our attention to which firms are central in and around the 2007-2009 U.S. Financial Crisis.Node centrality can be measured using a variety of metrics (e.g., degree, closeness, betweeness).We focus on degree; that is, on how many edges are incident to the node.Firms can be characterized by both their in-degree (number of incoming edges) and out-degree (number of outgoing edges).A firm having a large in-degree is one for which many other firms' microstructure measures are useful in forecasting its realized volatility.On the other hand, a firm with a large out-degree has microstructure measures that are useful for predicting the realized volatility of many other firms.Firms with large out-degree have the potential to spread risk through the financial system since aspects of their trading (captured via microstructure measures) propagate to other firms.Likewise firms with large indegree have the potential to absorb this risk.
In Figure 4, we display the 10 most highly connected firms according to their indegree and (separately) their out-degree, before, during, and after the U.S. Financial Crisis.Several observations are in order.First, Lehman Brothers (LEH) has a large out-degree in the January-June 2006 and July-December 2006 networks.During the intervening time period (April-September 2006), it has a large in-degree.That our methodology should identify Lehman Brothers as a highly connected firm in the lead-up to the crisis is interesting given that the broker-dealer's involvement in subprime mortgage lending has been recognized as a key contributor to the crisis [Friedman and Posner (2011)].American International Group (AIG) is also highly connected before the crisis; in fact, it is one of the top firms according to out-degree in six of the nine networks that span 2006-2008.It is a top firm by in-degree during the January-June 2007 period.Like Lehman Brothers, AIG played a major role in the crisis through its use of collateralized debt obligations (CDOs) and credit default swaps (CDSs), and was bailed out by the federal government shortly after Lehman Brothers' collapse [Friedman and Posner (2011)].
More generally, we note that the top firms are not always consistent across neighboring time periods.For example, a firm might be highly connected during one time window, but not during the windows immediately preceding or following it.(This is the case with T. Rowe Price (TROW), which has a large in-degree during April-September 2007, but neither a large in-degree nor a large out-degree during either of the other 2007 windows.)A major exception is AIG, as noted above.Our methodology highlights several additional firms that are known to have contributed to the crisis: Bear Stearns (BSC) is a top "in-firm" during April-September 2006 and a top "out-firm" in July-December 2007, The Federal National Mortgage Association (aka Fannie Mae; FNM) is a top out-firm during January-June 2007 and January-June 2008, and The Federal Home Loan Mortgage Corporation (aka Freddie Mac; FRE) is a top out-firm during April-September 2006.
Some of these results are consistent with those observed in monthly-scale bivariate Granger causality networks (see Figure 5).In the monthly-scale networks, as in their high-frequency counterparts, AIG, Fannie Mae, and Freddie Mac all have large outdegree before and/or during the crisis.Interestingly, AIG remains a large source of risk propagation (i.e., has high out-degree) through 2010, whereas it does not have a large out-degree in any of the high-frequency networks beyond April-September 2008.Another key difference between the low-and high-frequency settings is the role played by Lehman Brothers and Bear Stearns.In the high-frequency networks, Lehman Brothers, and -to a lesser extent -Bear Stearns, emerge as top out-firms in the lead-up to the financial crisis.In the monthly-scale networks, on the other hand, neither of these two firms have a large out-degree, although Lehman Brothers is consistently a top in-firm (absorber of risk) in 2006 and 2007.This difference raises the possibility that high-frequency networks may be able to identify risk propagating firms that are not highlighted in low-frequency networks.
To further explore the behavior of systematically important financial institutions, we consider subnetworks of firms that received considerable government assistance during or after the crisis (see Figure 6).The size of node i is proportional to the market capitalization of firm i, while the thickness of edge i → j is proportional to the increase in AUC obtained by using the features of firm i to predict the change in realized volatility of firm j.We select -for each year between 2006 and 2008 -the 10 firms within our sample that received the largest amount of Troubled Asset Relief Program (TARP) funding, which was provided to companies that were deemed "too big to fail"13 [Kiel and Nguyen (2013)].(For 2006 and2007, we also include Lehman Brothers and Bear Stearns, which did not receive TARP funding but which were crucial firms during this period.) Figure 6 highlights the role played by Lehman Brothers in the lead-up to the crisis.For example, in the April-September 2006 network, Lehman Brothers has incoming edges from all but one of the other firms and is particularly influenced by Bank of America (AUC increase = 0.305), Wells Fargo (AUC increase = 0.264), and JPMorgan Chase (AUC increase = 0.223).In early 2007, AIG emerges as a firm having large indegree, including from Bank of America (AUC increase = 0.297), Bear Stearns (AUC increase = 0.262), and Goldman Sachs (AUC increase = 0.259).Bank of America also has a large in-degree.By July-December 2007 (months before its collapse in March 2008), Bear Stearns has many incoming edges, the strongest of which is from the company that would come to purchase it, JP Morgan Chase (AUC increase = 0.144).In 2008, several of the firms previously considered are no longer present in our sample, whether because of their collapse (e.g., Lehman Brothers, Bear Stearns14 ) or because they are no longer among the top 75 financial institutions by market capitalization (e.g., Freddie Mac).However, firms like Citigroup, Wells Fargo, AIG, JPMorgan Chase, and Fannie Mae continue to have large in-and/or out-degree.(bottom).Standardized degrees (i.e., (firm degree -mean network degree)/standard deviation of network degree) are plotted so as to make results comparable across different networks.Banks (resp., broker-dealers, insurance companies) are displayed in red (resp., green, blue).Note that we add a small amount of random noise to the (x, y) coordinates of each firm so that firm labels do not overlap with one another.Full company names for all ticker symbols are provided in Table A2.
wide information is first absorbed into the prices of large stocks (which tend to be actively traded) and subsequently into the prices of small stocks (which tend to be less frequently traded) [Brennan et al. (1993)].More recently, Chordia et al. (2011) performs a Granger causal analysis on value-weighted portfolios of large and small market capitalization stocks.The authors regress daily returns of these portfolios on lagged values of returns, volatilities, and quoted spreads, and find that the returns of the large stock portfolio lead the returns of the small stock portfolio, especially when the large stocks experience low liquidity.While Chordia et al. (2011) includes a variety of financial variables (returns, volatilities, and quoted spreads) in their analysis, we are thus far unaware of any studies that examine whether market microstructure variables yield lead-lag relationships between small and large firms.Our random forest methodology, however, lends itself January − June 2006 q q q q q q q q AIG FNM GS .High-frequency realized volatility networks of financial institutions that received significant federal bailout packages (TARP funding).Each year, the 10 institutions within our sample that received the most TARP funding are selected, in addition to Lehman Brothers and Bear Stearns.Nodes are sized according to their market capitalization, with red (resp., green, blue) nodes indicating banks (resp., broker-dealers, insurance companies).Edge thickness is proportional to the increase in AUC obtained by including cross-features.Full company names for all ticker symbols are provided in Table A2.
well to this question.Indeed we can assess whether the microstructure features of small (resp., large) firms are useful in predicting future increases and decreases in market measures of large (resp., small) firms.Notice that our method -as previously described -is implemented on a firm-by-firm basis; that is, we use microstructure variables of firm y (and possibly of a second firm, x) to forecast market measures of firm y.This procedure is fundamentally different from the analysis in Chordia et al. (2011), which considers financial variables that have been aggregated over firms of similar size.So that we may compare our results to those in Chordia et al. (2011), we perform a similar aggregation.
To begin, we consider all banks, primary broker-dealers, and insurance companies that were active on each trading day of 2018.We rank these firms according to their average monthly market capitalization and take our final set of firms to be those in the first and seventh capitalization deciles.The first decile (54 firms) represents large stocks and the seventh decile (55 firms) represents small stocks.We use the seventh decile because stocks in lower tiers are likely to trade so infrequently as to make missing values a problem in our downstream analysis.Next, for each stock, we form time series of its microstructure variables and market measures, and then compute value-weighted averages for both small and large firms (separately).For example, the aggregate time series of the Roll measure for small and large firms is given by where {s i } 1≤i≤n s and { i } 1≤i≤n denote the set of small and large firms, respectively, MCAP j is the average monthly MCAP of firm j, and R j t is the value of the Roll measure of firm j at time t.We form aggregate time series for Amihud's lambda, VPIN, kurtosis, and realized volatility in an analogous manner to ( 14) and ( 15). 15 Finally, we calculate the sign of the change in average kurtosis and realized volatility: where, as before, h is a forecast horizon of 50 time bars.There are 2,881 observations in our final dataset and we perform 10-fold cross-validation (over the entirety of 2018) to evaluate the random forest classifier's performance.
The first question we address is which features are important for predicting market measure changes in firms of different sizes.We consider four prediction scenarios: (i) kurtosis for large firms, (ii) realized volatility for large firms (iii) kurtosis for small firms, and (iv) realized volatility for small firms.In each case, we include crossfeatures in our random forest model; that is, we use microstructure variables for both small and large firms.Furthermore, for each test set, we compute the mean decrease in accuracy (MDA) for each feature and average the results by firm size.Let M i x denote the MDA of feature x on test set i, where i = 1, 2, ..., 10.Then, for each i, we calculate We observe that, under all scenarios but one, the microstructure variables of large firms are more important than those of small firms.For example, when forecasting kurtosis for large firms, the median large firm MDA is approximately 0.12 while the median small firm MDA is closer to 0.1.Qualitatively similar results hold when predicting realized volatility for large firms and kurtosis for small firms.On the other hand, this pattern is reversed when we forecast realized volatility for small firms, in which case the small firms' features have higher MDA.Interestingly, though, the  have some overlap in this case, whereas in all other scenarios, the distribution of large firm MDA values lies entirely above the distribution of small firm MDA values.
To an extent, these results are consistent with those reported in Chordia et al. (2011) and Lo and MacKinlay (1990), where the weekly and daily returns of large stocks were found to lead those of small stocks (but not the reverse).Our analysis reveals a similar lead-lag pattern in a high-frequency setting: microstructure variables of large firms have predictive power when forecasting the sign of the change in kurtosis of the small firms' returns distribution.Moreover, when forecasting for large firms, the microstructure features of small firms are found to be less important than those of large firms.This conforms with earlier findings that small firm returns do not lead large firm returns.
We now turn our attention to the question of whether adding cross-features improves the random forest's predictive ability.We find that the results here are mixed (see Figure 8).Including cross-features yields a significant increase in AUC when forecasting realized volatility, and -to a lesser extent -kurtosis, for large firms.There is only minor predictive improvement, however, when using cross-features to predict realized volatility for small firms, and virtually no change when predicting kurtosis for small firms. 16As a robustness check, we repeat the analyses presented in this section on a set of information and communications technology (ICT) firms (rather than In each case, the random forest was fit twice, once without cross-features (e.g., using only large firms' features to predict large firms' measures) and again with cross-features.Thus, for each prediction scenario, two ROC curves are displayed: red (resp., black) curves indicate that cross-features were (resp., were not) used.The area under the curve (AUC) is reported in the lower right corner of each plot.
financial firms, as discussed here).We find that the ICT feature importance results are qualitatively similar to those we present here, though the ROC results are not.Our complete findings are shown in Appendix A.

Conclusion
We estimate financial networks by determining whether cross-effects in intraday trade data exist between each pairs of firms in our sample.We detect these crosseffects by assessing whether microstructure measures of one firm improve our ability to forecast the sign of the change in a market measure (either realized volatility or returns kurtosis) of another firm, where predictive performance is measured via the area under the curve (AUC).Because we learn our networks from high-frequency trade data, which tends to be both nonlinear and nonstationary, we use a random forest to forecast market measure changes.Random forests, a popular machine learning tool, provide a great deal of modeling flexibility as they do not impose a particular functional form on the data.We apply these methods to the trade data of large U.S.
financial institutions, demonstrating how our networks can be used to answer the same questions posed by researchers in the low-frequency setting (e.g., how network connectivity evolves over time and which types of firms interact with one another).High-frequency financial networks have the potential to yield novel insights into the workings of the financial system.Future work in this direction includes refining our network estimation procedures (e.g., by changing the microstructure variables used as features, or by considering different market measures for prediction).There are a number of hyper-parameters in our random forest model (W, the length of the lookback window; h, the forecast horizon; the length of the time bar, etc.) and we have yet to perform an exhaustive review of how these parameters impact our final results.
Moreover, the networks we construct are based on bivariate analyses; that is, by testing for predictive improvements in firm x when we include the features of one additional firm, y.We could instead undertake a multivariate analysis wherein we include features of all firms in order to predict the change in the market measure of firm x.Such an analysis would give assurance that any cross-predictability detected between firms x and y is indeed due to the measures of firm y and not to measures of a firm that is correlated with y (i.e., indirect associations).Preliminary work in this direction has yielded mixed results; however, it is possible that by adjusting our model beyond the standard random forest, we may be able to make further progress.On that note, it is also interesting to consider how time series models (e.g., an autoregressive integrated moving average (ARIMA) model with exogenous variables) would fare in predicting changes in market measures.(representing small technology firms).
Figure A1 displays MDA feature importance results.On average, large firms' features are more important than small firms' features, regardless of whether we are forecasting realized volatility or kurtosis, for large firms or for small firms.However, the difference in feature importances is larger when predicting for large firms (left panel) than for small firms (right panel), which suggests (a) that small firms carry little information about large firms, and (b) that large firms do contain some information about small firms, but the small firm features are still significant.These results are qualitatively similar to what we obtain for financial firms, except that, for the latter, small firms' features are more important than large firms' when predicting small firm realized volatility.In Figure A2, we show that including cross-features in the random forest model yields very little change in the AUC.An exception is when we predict kurtosis for small firms (bottom right plot), in which case we see an appreciable increase in AUC when we add large firms' features.

Figure 1 .
Figure1.Schematic of the purged cross-validation procedure.The sample period is divided into 6 intervals of equal length, each interval serving as a test set as we iterate over the sample period.Suppose interval 4 is the current test set.Then five days worth of data is purged from before and after interval 4, and the remaining data is used as the training set.

Figure 2 .
Figure 2. Receiver operating characteristics (ROC) curves plot the true and false positive rates of a binary classifier as the decision threshold is varied.Random classifiers have a diagonal ROC curve, with a corresponding area under the curve (AUC) of 0.5.Higher values of AUC, as illustrated here, indicate better classifier performance.
obtain intraday trade data from the NYSE Daily Trade and Quote (TAQ) database, via Wharton Research Data Services (WRDS) [NYSE Trade and Quote Database].
high-frequency networks then become less dense through the end of 2000, at which point connectivity repeatedly increases and decreases (albeit with an overall upward trend) through late 2003.These results are less interpretable than those in the low-frequency setting, where the density consistently decreases from 1999 through late 2002.Both the low-and high-frequency networks have elevated density in 2003.The intraday networks become less dense in 2004, before increasing in density in 2005.On the other hand, the monthly-scale networks remain dense throughout 2003 and 2004 and are not particularly dense in 2005.Intraday networks exhibit a fairly persistent increase in density through 2006 and 2007.In fact, a global maximum density of 36.2% is reached in the end of 2007, subsequent to the summer 2007 failure of two subprime mortgage funds associated with the investment bank Bear Stearns.Connectivity then drops sharply in the first half of 2008 before increasing again.These results are somewhat consistent with what is observed in the monthly-scale networks, where density steadily increases until the beginning of 2008, then decreases, and finally spikes following the September 2008 collapse of the investment bank Lehman Brothers.High-frequency networks, like their low-frequency counterparts, display an overall decline in density in the late 2000s.

Figure 3 .
Figure3.Density of financial networks over the 1998 to 2010 period, where network density refers to the proportion of realized edges.Top (high-frequency networks): each year is divided into three overlapping windows of six months each, a network is estimated for each window by applying the methodology described in 2 to intraday trade data, and the network density is plotted.Bottom (low-frequency networks): the sample period is divided into 36-month rolling windows, a network is estimated for each window by applying bivariate Granger causality to monthly stock returns, and the network density is plotted.Note that there are 39 high-frequency networks and 156 low-frequency networks.

Figure 5 .
Figure5.Highly connected firms in low-frequency bivariate Granger causality networks before, during, and after the U.S. Financial Crisis.Firms are ranked according to their in-degree (top) and out-degree (bottom).Standardized degrees (i.e., (firm degree -mean network degree)/standard deviation of network degree) are plotted so as to make results comparable across different networks.Banks (resp., broker-dealers, insurance companies) are displayed in red (resp., green, blue).Note that we add a small amount of random noise to the (x, y) coordinates of each firm so that firm labels do not overlap with one another.Full company names for all ticker symbols are provided in TableA2.
Figure6.High-frequency realized volatility networks of financial institutions that received significant federal bailout packages (TARP funding).Each year, the 10 institutions within our sample that received the most TARP funding are selected, in addition to Lehman Brothers and Bear Stearns.Nodes are sized according to their market capitalization, with red (resp., green, blue) nodes indicating banks (resp., broker-dealers, insurance companies).Edge thickness is proportional to the increase in AUC obtained by including cross-features.Full company names for all ticker symbols are provided in TableA2.

Figure
Figure 7 displays the distributions of M i S mall 1≤i≤10 and of M i Large 1≤i≤10 for each of the four prediction scenarios.We observe that, under all scenarios but one, the microstructure variables of large firms are more important than those of small firms.For example, when forecasting kurtosis for large firms, the median large firm MDA is approximately 0.12 while the median small firm MDA is closer to 0.1.Qualitatively similar results hold when predicting realized volatility for large firms and kurtosis for small firms.On the other hand, this pattern is reversed when we forecast realized volatility for small firms, in which case the small firms' features have higher MDA.Interestingly, though, the

Figure 7 .
Figure 7. Distribution of the average mean decrease in accuracy (MDA), grouped by firm size.For example, the leftmost boxplot illustrates the distribution of M i Large 1≤i≤10 , where M i Large denotes the average of the MDA values for the large firms' Roll measure, Amihud's lambda, and VPIN, evaluated over test set i.The left (resp., right) panel displays feature importance results when forecasting the sign of the change in kurtosis and realized volatility for large (resp., small) financial firms.

Figure 8 .
Figure 8. ROC curves for predicting the sign of the change in realized volatility (top row) and in kurtosis (bottom row) for large firms (left column) and small firms (right column).In each case, the random forest was fit twice, once without cross-features (e.g., using only large firms' features to predict large firms' measures) and again with cross-features.Thus, for each prediction scenario, two ROC curves are displayed: red (resp., black) curves indicate that cross-features were (resp., were not) used.The area under the curve (AUC) is reported in the lower right corner of each plot.

Funding
KK and SB were supported in part by NIH award R01GM135926.SB also acknowledges partial support from NSF awards DMS-1812128, DMS-2210675 and NIH award R21NS120227.