Statistically validated network of portfolio overlaps and systemic risk

Common asset holding by financial institutions, namely portfolio overlap, is nowadays regarded as an important channel for financial contagion with the potential to trigger fire sales and thus severe losses at the systemic level. In this paper we propose a method to assess the statistical significance of the overlap between pairs of heterogeneously diversified portfolios, which then allows us to build a validated network of financial institutions where links indicate potential contagion channels due to realized portfolio overlaps. The method is implemented on a historical database of institutional holdings ranging from 1999 to the end of 2013, but can be in general applied to any bipartite network where the presence of similar sets of neighbors is of interest. We find that the proportion of validated network links (i.e., of statistically significant overlaps) increased steadily before the 2007-2008 global financial crisis and reached a maximum when the crisis occurred. We argue that the nature of this measure implies that systemic risk from fire sales liquidation was maximal at that time. After a sharp drop in 2008, systemic risk resumed its growth in 2009, with a notable acceleration in 2013, reaching levels not seen since 2007. We finally show that market trends tend to be amplified in the portfolios identified by the algorithm, such that it is possible to have an informative signal about financial institutions that are about to suffer (enjoy) the most significant losses (gains).

The 2007-2008 global financial crisis has drawn the attention of both academics and regulators to the complex interconnections between financial institutions 1 and called for a better understanding of financial markets especially from the viewpoint of systemic risk, i.e., the possibility that a local event triggers a global instability through a cascading effect [2][3][4][5][6][7] . In this respect, while much effort has been devoted to the study of counter-party and roll-over risks caused by loans between institutions [8][9][10][11][12][13][14][15][16][17] , the ownership structure of financial assets has received relatively less attention, primarily because of lack of data and of adequate analysis techniques. Yet, while in traditional asset pricing theory assets ownership does not play any role, there is increasing evidence that it is a potential source of non-fundamental risk and, as such, can be used for instance to forecast stock price fluctuations unrelated to fundamentals 18,19 . More worryingly, if investment portfolios of financial institutions are too similar (as measured by the fraction of common asset holdings, or portfolio overlap), the unexpected occurrence of financial distress at the local level may trigger fire sales, namely assets sales at heavily discounted prices. Fire sales spillovers are believed to be an important channel of financial contagion contributing to systemic risk [20][21][22][23][24][25] : when assets prices are falling, losses by financial institutions with overlapping holdings become self-reinforcing and trigger further simultaneous sell orders, ultimately leading to downward spirals for asset prices. From this point of view, even if optimal portfolio selection helps individual firms to diversify risk, it can also make the system as a whole more vulnerable 1,26 . The point is that fire sale risk builds up gradually but reveals itself rapidly, generating a potentially disruptive market behavior.
In this contribution we propose a new statistical method to quantitatively assess the significance of the overlap between a pair of portfolios, with the aim of identifying those overlaps bearing the highest riskiness for fire sales liquidation. Since we apply the method to institutional portfolios we will use interchangeably the terms institution and portfolio throughout the paper. In practical terms, the problem consists in using assets ownership data by financial institutions to establish links between portfolios having strikingly similar pattern of holdings. Market ownership data at a given time t consists of a set I(t) of institutions, holding positions from a universe of S(t) securities (or financial assets in general). The |I(t)| × |S(t)| ownership matrix W(t) describes portfolios composition: its generic element W is (t) denotes the number of shares of security s ∈ S(t) held by institution i ∈ I(t). The matrix W(t) can be mapped into a binary ownership matrix  t ( ), whose generic element A is (t) = 1 if W is (t) > 0 and 0 otherwise, which allows to define the degree d i (t) = ∑ s A is (t) of an institution i as the number of securities it where the right-hand size of Eq. (1) is the cumulative distribution function of π(⋅ |i, j, t), namely the probability to have an overlap larger or equal than the observed one under the null hypothesis. If such a p-value is smaller than a threshold P * (t) corrected for multiple hypothesis testing (see section Methods), we validate the link between i and j and place it on the monopartite validated network of institutions. Otherwise, the link is discarded. In other words, the comparison is deemed statistically significant if the observed overlap would be an unlikely realization of the null hypothesis according to the significance level P * (t). This procedure is repeated for all pairs of institutions, resulting in the validated projection t ( )  of the original network: a monopartite network whose generic When applied to a historical database of SEC 13-F filings (see section Methods for details and Fig. 1 for the temporal evolution of the main dataset statistics), our method yields statistically validated networks of overlapping portfolios whose properties turn out to be related to the occurrence of the 2007-2008 global financial crisis. In particular, we propose to regard the average number of validated links for each institution as a simple measure of systemic risk due to overlapping portfolios. Such a measure gradually built up in years from 2004 to 2008, and quickly dropped after the crisis. Systemic risk has then been increasing since 2009, and at the end of 2013 reached a value not previously seen since 2007. Note that because there is only one large crisis in our dataset, we refrain from making strong claims about the systematic coincidence of highly connected validated networks and the occurrence of financial crises. We also find that overlapping securities (i.e., those securities making up the validated overlaps) represent a larger average share of institutional portfolios, a configuration which would exacerbate the effect of fire sales. Additionally, we show that the presence of a validated link between two institutions is a good indicator of portfolio losses for these institutions in times of bearish markets, and of portfolio growth in times of bullish markets: validated links should indeed represent self-reinforcing channels for the propagation of financial distress or euphoria. More in general, we find that market trends tend to be amplified in the portfolios identified by the algorithm. Finally, we apply the validation procedure to the overlapping ownerships of securities to identify contagion channels between securities themselves, and observe a stable growth of validated securities over the considered time span. This signals an ongoing, deep structural change of the financial market and, more importantly, that there are more and more stocks that can be involved in a potential fire sale. The presence of local maxima within this trend correspond to all periods of financial turmoil covered by the database: the dot-com

Results and Discussion
In order to properly understand the results of our validation method for overlapping portfolios, it is useful to provide a specific example. Figure 2 shows two similar situations: two pairs of portfolios both owning 50 securities in common. Only the right pair is validated by our method, whereas the left pair is not. This happens because the portfolios in the right pair are of smaller size (especially the blue one) and the same overlap is therefore less likely to happen by chance. Hence, although the algorithm cannot directly take into account how much each institution is investing (particularly with respect to the total asset managed by the institution), it does so indirectly by taking (lower right panel) where p s (t) and σ s (t) = ∑ i W is (t) are the price and number of outstanding shares of security s at time t, respectively. Solid lines correspond to a locally weighted least squares regression (loess) of data points with 0.2 span. into account the diversification of different portfolios (i.e., the degree of institutions). Validated pairs of portfolios indeed correspond to overlaps which constitute a considerable fraction of the total portfolio value of the pair. In short, pairs are validated when neither the diversification of the investments nor the degree of the securities are sufficient to explain the observed overlap. As we shall see later, the same mechanism is at play when we project the bipartite network on the securities side. In this case, since the degree of a security is a good approximation of its capitalization and of the dilution of its ownership 42,43 , the method will tend to validated links among securities whose ownership is relatively concentrated. Figure 3 gives an overall picture of how the validated network looks like. In general, we observe the presence of multiple small clusters of institutions, together with a significantly larger cluster composed by many institutions linked by a complex pattern of significant overlaps.

Temporal evolution of the validated network of institutions.
After these preliminary observations, we move to the temporal analysis of the structural properties of the whole validated network of institutions. In Fig. 4 we show the fraction of validated institutions (defined as the number of institutions having at least one validated link over the total number of institutions appearing in the ownership network) as a function of time. We also disaggregate data according to the type of institution and plot in this case both the number of validated The composition of each portfolio is denoted by different colors (blue and orange links). The symbol size of a security is proportional to the total number of its investors. Although both pairs in the plot have an overlap of 50 securities, the right pair is validated by the algorithm whereas the left pair is not. This is due to the fact that both the blue and orange portfolios on the right are smaller (the blue one in particular) and therefore under the BiCM null model the chance of having the same overlap of the pair on the left is considerably smaller.  institutions and the original number of institutions (we avoid to use directly their ratio for a better visualization). One sees that there is no particular pattern and the fraction of validated institutions is almost constant in time. By looking at disaggregated data, a few interesting things emerge. Investment Advisors account for the largest percentage of institutions and, more prominently, of validated institutions, followed by Hedge Funds and Mutual Funds. The most interesting behavior is however that of Hedge Funds in the validated networks: they are relatively under-represented until 2004, but after that their number displays a steep increase. Figure 5 displays the temporal evolution of the average degree in the validated network, which measures how much validated institutions are connected to each other. One clearly sees an overall increasing trend with a strong acceleration during the years preceding the financial crisis. In particular, the average degree reaches a maximum few months before prices started to fall. Furthermore, our results suggest that a similar process is taking place after 2009, a fact that might question the stability of financial markets nowadays. The right-hand side plot of Fig. 5 reports the same quantity for each category of institutions, which also has peaks just before the 2008 crash. The notable exception is Hedge Funds, whose average degree is roughly constant in time. In addition, the peak for Investment Advisors, Private Banking funds and Brokers occurs roughly 1-2 quarters before the global peak.
Validated overlaps vs portfolio size and security capitalization. A seemingly major shortcoming of using a binary holding matrix t ( )  for validation purposes is that of not taking into account neither the concentration of ownership of a given security (i.e., which fraction of the outstanding shares a given institution is holding) nor the relative importance of different securities in a portfolio (i.e., which percentage of the total portfolio market value a security is representing). These are clearly important types of information, since one would expect a mild price impact following the liquidation of the asset by an institution if the latter owns only a small fraction of that security's outstanding shares. Conversely, if the asset represents a considerable fraction of the portfolio market value, a price drop will have a stronger impact on the balance sheet of the institution. However, despite validating weighted overlaps is more relevant than binary overlaps to identifying fire sales propagation channels, we cannot use the original weighted matrix W(t) in the validation procedure, as in this case it would be impossible to build an analytical null model-which would make the validation procedure extremely involving. Thus, we are forced to rely on binary overlaps. However, the dataset at our disposal allows us to check a posteriori the features of the portfolio positions which contributed to the formation of validated links.  To this end, using the information about the price p s (t) and outstanding shares σ s (t) of different securities s at time t, we compute the fraction of the total market value of portfolio i represented by security s, namely , and the fraction of outstanding shares of s held by institution i, namely c is (t) = W is (t)/σ s (t). We apply this procedure to each position W is (t) of the bipartite ownership network in order to characterize the features of the positions belonging to validated overlaps. Figure 6 shows that, on average, overlapping securities (i.e., securities making up the validated overlaps) represent a larger share of the validated portfolio, namely 6% more than the average share given by the inverse of the degree.
In order to study the concentration of ownership of different securities we use the following procedure. Each security s belongs by construction to d s (t)[d s (t) − 1]/2 pairs of overlapping portfolios, and we can compute which fraction f s (t) of such pairs that are validated by the algorithm. We then compute for each security the total capitalization (as a proxy for the liquidity of the security) as well as the average ownership fraction per institution 〈 c i (t)〉 = ∑ i c is (t)/d s (t) as a function of f s (t). In Fig. 7 we show scatter plots of these quantities together with straight lines obtained from log-linear regressions. As one can see, the probability that any pair of institutions investing in the same asset are validated by the algorithm decreases as a function of the capitalization of the asset, increases as a function of the concentration (i.e., with the average fraction of outstanding shares detained by an institution) and decreases as a function of the degree of the security. The relation is stronger for securities with higher degree, because of the larger number of available data points.  Here we divide the average share over overlapping securities (i.e., securities in the portfolio belonging to the overlap with a validated neighbor) and non-overlapping securities (the complementary set). We clearly see that overlapping positions correspond to larger shares in the portfolio. The plot refers to 2006Q4, yet the same qualitative behavior is observed for other dates. of the n institutions experiencing the highest drop in portfolio value between t and t + dt (which we refer to as "distressed" institutions). We first consider drops in absolute terms (i.e., the total dollar amount) which we believe is of macroeconomic significance and check the relation with portfolio returns later on. We use here n = 300 (roughly corresponding to 10% of the total number of institutions) and omit in the following the n subscript. We then compute the fraction l(t) of distressed institutions with respect to the total number of institutions I(t) and compare it with the fraction of distressed institution  l t ( ) in the validated network. The ratio then indicates if distressed institutions are over-represented in the validated network. Indeed, if G I (t) = 1 the algorithm is not doing anything better than putting distressed institutions at random in the validated network, whereas, if G I (t) > 1 we effectively gain information by knowing that a institution belongs to  t ( ). Similarly, we compare the fraction of links in the validated network which connect institutions that are both distressed with the fraction of such links when all overlapping pairs of institutions (i.e., all pairs whose portfolios having at least one security in common) are considered. The ratio between these two quantities, namely , can then be used to assess the effectiveness of the algorithm to establish a link between two distressed institutions in the validated network. Since all the positions in our dataset are long positions, it makes sense to relate G I (t) and R I (t) to an index that encompasses many securities. Figure 8 shows these quantities as a function of the market return r(t) between t and t + dt as measured by the Russell 2000 index. Indeed, both ratios are correlated with the total loss, and are significantly larger than 1 when r(t) ≪ 0 (R I in particular reaches values close to 8 in periods of major financial distress). Notably, both ratio are close to 1 when the market loss is close to 0, and decline afterwards. This could be interpreted as the fact that, in times of market euphoria, overlapping portfolios turn into self-reinforcing bubbles.
When we repeat the same procedure for portfolio returns (i.e., we use portfolio returns to label institutions as distressed) we do not obtain meaningful results. This is however due to the fact that abnormal returns are in general observed for small portfolios for which we have few data points. Given the statistical nature of our method we cannot hope to correctly identify such situations for which a different (probably case by case) methodology is clearly needed. We can however take a simpler point of view and take for each time t all portfolios whose return is smaller (in absolute term) than a threshold r max that we use as a parameter. We then use this subset to compute the average return of validated portfolios  . For a given value of r max , we then have a scatter plot of these two quantities (one point for each date t) which is well approximated by a straight line (see Fig. 9 left panel for an example). Note that with the significance threshold P * (t) used one has roughly half of the institutions in each set (see Fig. 4 left panel). Finally, we linearly regress , and plot the value of the slope as a function of the threshold r max . As one can see in the right panel of Fig. 9, the slope is significantly larger than 1 for values of the threshold up to roughly 30% in general, and up to 50% when we consider positive and negative returns separately. In the latter case we first split for each date institutions with positive/negative returns and compute return averages in the validated and complementary set. The fact the the slope become slightly smaller than 1 for large values of r max is putatively due to abnormal returns, most likely associated with small portfolios  which tend to be outside the validated network. While this drawback is unavoidable given the statistical nature of our method, on the overall these results show that as long as abnormal returns are not considered, the returns of validated portfolios are on average greater (in absolute terms) than those of their not validated counterpart.
Buy and sell networks: the case of Hedge Funds. Before moving to the analysis of the validated network of securities, we illustrate another interesting application of our method. Our dataset allows us to build, for each date t, the buy (or sell) bipartite network, corresponding respectively to the positions acquired (or sold) by each institution between t − dt and t: ) < 0 and 0 otherwise. Validation of these bipartite networks then highlights the institutions that have updated their portfolios in a strikingly similar way. As a case study we consider the Hedge Funds (HF) buy/sell networks, meaning that we only consider the positions bought or sold by HF (discarding all other links), and apply the validation procedure to these subnetworks. The focus on this particular subset of funds is motivated by the Great Quant Meltdown of August 2007, during which quantitative HF, in particular those with market neutral strategies, suffered great losses for a few days, before a remarkable (although incomplete) reversal (see, e.g. ref. 44). In addition, we wish to investigate whether HF reacted in a synchronous way at the end of the 2000-2001 dot-com bubble. Red points correspond to dt equal to one quarter, blue points to dt equal to two quarters; solid lines correspond to a locally weighted least squares regression (loess) of data points with 0.2 span. Panels are divided in four regions, corresponding to probabilities larger/smaller than one (i.e., distressed institutions over/ under represented in the validated networks) and to r(t) larger/smaller than zero (i.e., market contraction/growth). . When returns greater than roughly 30-40% are excluded the slope is found significantly larger than 1. This indicates that portfolios in the validated network tend to have higher returns (in absolute terms) than their not validated counterpart. The inset shows the overall fraction of returns satisfying |r i (t)| < r max as a function of r max .
Scientific RepoRts | 6:39467 | DOI: 10.1038/srep39467 As for Fig. 3, Fig. 10 shows that the fraction of HF validated in the buy/sell network is roughly constant in time, with however some more interesting local fluctuations (especially in the years around 2008). For what concerns the average number of neighbors in the validated network, one sees that the fluctuations of the sell networks lag by 3 months those of the buy networks: indeed, the cross-correlation is maximal at such a lag, and is quite high (0.8). This is possibly due to the fact that the typical position holding time of HF is smaller than 3 months: what has been bought will have been sold 3 months later. Notably, the right panel of Fig. 10 points to the fact that buy networks are more dense on average than sell networks. This is also reflected in the autocorrelation of the average number of neighbors, which decreases faster for sell networks. Since our dataset only contains long positions, we can only conclude that HF are more synchronized when they open long positions, and liquidate them in a less synchronized way.
Using as a first approximation the average number of validated neighbors per fund in order to assess the synchronicity of the HF actions, we clearly observe significant increasingly synchronized buying patterns after the top of the dot-com bubble. There may be two reasons for buying at this date: either the strategies of the HF were not aware of the bubble burst and were still using trend-following, or they took advantage of the burst to buy stocks at a discount. Noteworthy, synchronized selling lags on buying, and was overall less intense. Concerning the period of the global financial crisis, we observe one buy peak at 2007Q3, and one sell peak at 2007Q4. The first peak may indeed be related to the Big Quant Meltdown of August 2007. However, the so-called long-short market neutral funds that were forced to liquidate their positions should appear in the sell network, not the buy one. This would have been observed if that crisis had happened at the end of a trimester. Unfortunately, there is an almost two-month delay between the meltdown and the reporting, which probably hides the event. At all rates, the meltdown acted as a synchronization event, as the buy network density is clearly an outlier at the end of September 2007: HF have therefore acquired significantly similar long positions in their portfolios during the same quarter, and then, expectedly, liquidated them by the end of the next trimester.
Temporal evolution of the validated network of securities. In this section we finally use our method to detect statistically significant common ownerships of securities, in order to identify contagion channels between securities themselves. Thus, we apply the validation procedure to the security ownership overlap . The presence of a validated link between two securities then reflects the fact that they share a significantly similar set of owners, which again translates into a potential contagion channel through fire sales. Figure 11 shows the temporal evolution of aggregate features of the validated network projection on securities. Contrarily to the case of the institutional projection (Figs 4 and 5), here we observe a stable growth of validated securities: there are more and more stocks that can be involved in a potential fire sale (or closing down of similar institutions). Moreover, as testified by the growth of the average degree of validated securities, the validated network becomes denser, signaling the proliferation of contagion channels for fire sales. Note the presence of local maxima that correspond to all major financial crises covered by the database: the dot-com bubble of 2001, the global financial crisis of 2007-2008 and the European sovereign debt crisis of 2010-2011. As for the case of institutions, the similarity pattern of securities ownerships is maximal at the end of the considered time span.
The fact that the average degree of the validated network of securities keeps growing boils down to the fact that institutions choose securities, not the opposite. While the number of institutions in our dataset has increased over the years, the number of securities has been roughly constant. If a new institution selects at random which assets to invest in, then the average degree of the securities network would stay constant. This is not the case, if only because of liquidity constraints. Therefore, on average, the portfolio of a new institution is correlated with the ones of pre-existing institutions.
In order to detect if the observed patterns concern peculiar classes of securities, we perform an analysis of the validated network distinguishing securities according to the Bloomberg Industry Classification Systems (BICS) 45 -which rests on their primary business, as measured first by the source of revenue and second by operating income, assets and market perception. Each security thus belongs to one of the following sectors: Communications, Consumer Discretionary, Consumer Staples, Energy, Financials, Health Care, Industrials, Materials, Technology, Utilities (or other). In particular, we try to detect whether securities of the same category tend to be connected together in the validated network. To this end, we denote as internal a validated link connecting two securities with the same BICS label, and we compute the internal degree as the degree of a security restricted to internal links. As Fig. 12 shows, the categories of securities that are more internally connected are (notably) Financials and, to a lesser extent, Consumer Discretionary. This does not mean that portfolio overlaps concentrate on these categories, but rather that relatively more contagion channels exist within securities belonging to them.

Discussion
In this work, we have proposed an exact method to infer statistically robust links between the portfolios of financial institutions based on similar patterns of investment. The method solves the problem of evaluating the probability that the overlap of two portfolios of very different size and diversification is due to random allocation of assets of very different market capitalization and number of owners. The use of an appropriate null hypothesis provided by the bipartite configuration model 40 considerably improves the statistical significance of the detected features of the validated networks. Note that the method is general, and can be applied to any bipartite network representing a set of entities sharing common properties (e.g., membership, physical attributes, cultural and taste affinities, biological functions, to name a few) and where the presence of (unlikely) similar sets of neighbors is of interest.
The present study then points to the conclusion that, just before financial crises or bubble bursts, the similarity of institutions holdings increases markedly. Perhaps worryingly for equity markets, the proposed proxy of fire sale risk, having reached a peak in 2008 and subsequently much decreased, has been increasing again from 2009 to the end of our dataset (2013) up to levels not seen since 2007. Despite our method relies on binary ownership information, we also found that on average overlapping securities correspond to larger shares of validated portfolios, potentially exacerbating fire sales losses. In addition, the proposed validation method can effectively retrieve the institutions which are about to suffer significant losses in times of market turmoil (when validated links are the channels for which liquidation losses propagate), as well as those with the highest growth in times of market euphoria (when overlapping portfolios turn into self-reinforcing bubbles). Finally we show that the number of securities that can be involved in a potential fire sale is steadily growing in time, with an even stronger proliferation of contagion channels.
In this work we have only investigated patterns of portfolio overlap, not the probability that they lead to fire sales. This is a more complicated problem for which other datasets and econometric techniques are needed. However, even if we cannot draw any strong implication from our findings, all the analysis we performed confirm the coherence of our method and suggest that overlapping portfolios do play a role in financial turmoils. Furthermore, the relationship between holdings and future portfolio changes must be better characterized. Indeed, even if two institutions with different strategies converge to a similar portfolio, this does not imply that they will update the latter in the same way and at the same time. However, it is likely that part of the institutions follow (in fine) equivalent strategies, which implies portfolio overlap and subsequent increased risk of fire sales, which triggers further leverage adjustment, as pointed by refs 23, 24. Finally, it will be useful to repeat our analysis on larger datasets so as to encompass other bubbles and crises, and to examine difference in investment patterns across various markets. Methods Dataset. We extracted 13-F SEC filings (https://www.sec.gov/) from the Factset Ownership database from 1999Q1 to 2013Q4, covering institutions valued more than 100 million dollars in qualifying assets which must report their long positions to the SEC at the end of each trimester. As the 13-F dataset contains only positions greater than 10000 shares or $200000, very small positions are already filtered out. The dataset is composed of a set I(t) of approximately 1500 ÷ 3500 institutions, holding positions from a set S(t) of securities, whose size fluctuates around 12500 (see Fig. 1). Note that the portfolios of sub-funds are merged into a single report. In addition to the raw ownership data, our dataset is complemented by meta-data about both institutions and securities.
Significance level under multiple tests. In order to choose an appropriate threshold (the significance level) P * (t) to be used in the validation procedure, we have to account for the multiple hypothesis tested (corresponding to the number n pairs (t) of possible pairs of institutions having a nonzero overlap). Here we use the rather strict Bonferroni correction 46 , meaning that we set the threshold to P * (t) = ε/n pairs (t). Note that the choice of the significance level still leaves some arbitrariness. While results presented in the paper are obtained with ε = 10 −3 , we have tested our method with various values of ε, and employed also the less-strict false discovery rate (FDR) criterion 47 , without finding major qualitative differences. In fact, while the final size of the validated network clearly depends on the threshold, the relative temporal changes of the network statistics are much less affected by the particular value used.
Resolution problems for the hypergeometric distribution approach. As stated in the Introduction, the approach proposed in ref. 34 to divide the original bipartite network into homogeneous subnetworks of securities has some intrinsic limitations, especially when securities are characterized by a strongly heterogeneous number of investors (as it generally happens in stock market data). In this circumstance, in fact, the splitting procedure often translates into almost empty subsets-especially for securities held by a large number of investors. In these subsets, overlaps can assume only a few values, bounded by the limited number of securities considered, resulting in a handful, spaced-out possible outcomes for the p-values. The problem then arises with the use of a global threshold corrected for multiple hypothesis testing. In fact, since institutions are compared on the many subnetworks of securities with the same degrees, n pairs (t) scales as : the validation threshold becomes extremely small for large and heterogeneous systems and vanishes in the infinite size  limit. These issues lead to a serious problem of resolution, since P * (t) is too small to validate even the smallest non-zero p-value in most of the subnetworks. As a result, the validated network becomes almost empty by construction. Overall, while the method proposed in ref. 34 works well for small networks with little degree heterogeneity, the same approach is not feasible in the case of large scale and highly heterogeneous networks.

p-values from the Bipartite Configuration Model. Determining the probability distributions used in
Eq. (1) requires to solve a technical problem caused by the heterogeneity of both institutions and securities. For example, it is hard a priori to compare a portfolio with very few assets and one with very many assets. However, the bipartite configuration model (BiCM) 40 provides a null network model suitable for these kind of situations. We remand the reader to refs 40, 41 and 48 for more details on the method. In the following we will omit the explicit time dependence of the quantities considered, since the same procedure is repeated for each date.
In a nutshell, the BiCM prescribes to build the null model simply as the ensemble Ω of bipartite networks that are maximally random, under the constraints that their degree sequences of institutions and security is, on average, equal to the one of the original network. This is achieved through maximization of the Shannon entropy of the network subject to these constraints, that are imposed through a set of Lagrange multipliers θ = { } (one for each node of the network). Solving the BiCM means exactly to find these multipliers, that quantify the abilities of nodes to create links with other nodes. Thus, importantly, nodes with the same degree have by construction identical values of their Lagrange multipliers. Once these multipliers are found, the BiCM prescribes that the expectation values within the ensemble of the network matrix element 〈 A is 〉 Ω , i.e., the ensemble probability Q is of connection between nodes i and s, is given by: is is i s i s and the probability of occurrence Q A ( ) of a network  in Ω is obtained as the product of these linking probabilities Q is over all the possible I × S pairs of nodes. In other words, links are treated as independent random variables, by defining a probability measure where links correlations are discarded. The key feature of the BiCM model is that the probabilities {Q is } can be used to directly sample the ensemble of bipartite graphs and to compute the quantities of interest analytically. We can thus use the matrix  to compute the expectation values of portfolios overlap between two institutions i and j as: ij s is js  or to compute the probability distribution π(⋅ |d i , d j ) of the expected overlap under the null hypothesis of random connections in the bipartite network-which, according to the BiCM prescription, only depends on the degrees of institutions i and j. Indeed, π(⋅ |d i , d j ) is actually the distribution of the sum of S independent Bernoulli trials, each with probability Q is Q js . This distribution can be computed analytically using a Normal approximation of the Poisson-Binomial distribution 49 . This approach has been developed by ref. 50 in parallel with our research. Here we discuss instead an exact and optimized numerical technique to compute π(⋅ |d i , d j ). Indeed, the computational complexity of the numerics can be substantially reduced by recalling, again, that Q is ≡ Q is′ if d s ≡ d s′ ∀ i: connection probabilities only depends on nodes degree values. This is an important observation, which translates into the following statement: the expected overlap between any two institutions i and j restricted to the set of securities with a given degree follows a binomial distributionwith probability Q is Q js (where s is one of these securities) and number of trials equal to the cardinality of such set. More formally, if that since this computation is made on the whole network, i.e., considering all the securities, we have a fairly large spectrum of possible p-values. Thus, also if we still use a threshold depending on the number of hypothesis tested (which however now scales just as I 2 ), we have a much higher resolution than in ref. 34, and can obtain non-empty and denser validated networks.