How News May Affect Markets’ Complex Structure: The Case of Cambridge Analytica

The claim of Cambridge Analytica, a political consulting firm, that it was possible to influence voting behavior by using data mined from the social platform Facebook created a sudden fear in its users of being manipulated; consequently, even the market price of the social platform was shocked.We propose a case study analyzing the effect of this data scandal not only on Facebook stock price, but also on the whole stock market. To such a scope, we consider 15-minutes prices and returns of the set of the NASDAQ-100 components before and after the Cambridge Analytica case. We analyze correlations and Mutual Information among components finding that assets become more correlated and their Mutual Information grows higher. We also observe that correlation and Mutual Information are mutually increasing and seem to follow a master curve. Hence, the market appears more fragile after the Cambridge Analytica event. In fact, as it is well-known in finance, an increase in the average value of correlations augments the systemic risk (i.e., all the market can collapse as a whole) and decreases the possibility of allocating a safe investment portfolio.


Introduction
Social media platforms like Facebook (FB) have become the main communication medium; however, the concentration of users' data in the hands of a few big players like FB and Google has raised concerns about the possibility of getting a monopolistic control of information.
In this scenario, the Cambridge Analytica (CA) scandal, brought to the fore in 17 March 2018, has ignited a strong debate. CA was a British political consulting firm that claimed to offer, during the electoral processes, services of strategic communication based on data mining, data brokerage, and data analysis techniques. CA's role in political campaigns has been controversial and it is still a subject of ongoing criminal investigations; however, the effectiveness of CA's methods for targeting voters is strongly questioned by political scientists.
The collection of personally identifiable information of at least 87 million Facebook (FB) users collected by CA since 2014 brought up a data scandal, since CA held that those data were allegedly used to attempt to influence voting [1]. Even if FB banned CA and restricted the access to its own data from external companies, the sudden fear that FB data could be used to influence and manipulate people created a shock in the FB stock price.
The impact of event-related news on financial markets has always received privileged attention in academic literature since Eugene Fama conducted his semi-strong tests on the Efficient Market Hypothesis [2][3][4][5][6]. Many of those works focused on the market reaction to common public announcements, such as dividend issues and stock splits. This allowed mitigating the effect of spurious events, but restricted the investigation to a limited number of cases.
The results of event studies making use of intra-day data seem to suggest that the release of new information is quickly reflected in stock returns and in their volatility [7,8]. Moreover, higher volatility seems to persist for several hours following news release [7]. This seems to be true also in the case of intra-day, fixed-income rates and foreign-exchange future rates [9].
Not only news content, but also media coverage might play a relevant role. What seems to emerge is that trading activity and volatility in a company's stock do increase as the company captures the attention of the media [10][11][12]. In this sense, stale news also seems to influence the behavior of investors [13].
It is also worth noting that particularly relevant and resonating events might trigger periods of market turmoil [14]. In those periods, the correlation of all the stocks in the market seems to increase and, thus, achieving diversification might become difficult [15][16][17]. To this respect, Zheng et al. find that the first Principal Component of assets' cross-correlation might be used as an effective measure of systemic risk [18].
Eventually, assets' cross-correlation is not the only dependency measure affected in periods of financial distress. Wang and Hui show, for worldwide market indexes, how their Mutual Information, measuring non-linear dependency, reached a peak in the middle of the 2008-2009 financial crisis [19].
The contribution of this work is to present a case study concerning the impact of a media resonating data scandal (CA) not only on the asset directly involved in the scandal (FB), but also on the whole market. In order to do so, we consider a dataset containing the time series of 15-minute intraday prices of the NASDAQ-100 components spanning from 1 March to 12 April 2018. In Section 2, we analyze volatility, cross-correlations, and Mutual Information of the NASDAQ-100 components. We show how the market becomes more interconnected, and hence more fragile, after the CA event. We discuss the limits and mark the perspectives of our findings in Section 3, considering also possible future developments. Finally, in Section 4, we describe the dataset in detail and recap the methods and models applied in the analysis.

Results
To explore the impact of the CA event on the market, we first analyze the effect of the event on the most involved stock, i.e., Facebook. In Figure 1, we show both price and log-returns of the FB stock in a period centered on the CA event.
(a) It is clear that not only FB's price drops down on a lower level, but also that its volatility (i.e., the size of the fluctuations of the log-returns) increases. After the CA scandal, we observe a ∼ 165% increase in FB volatility and a ∼ 15% increase in the average volatility for all the assets considered.
We also consider the 10 stocks showing the highest values of volatility before and after CA. In Table 1, we notice the presence of FB among the 10 stocks showing the highest volatility after CA. There seems to be also an increase in the number of technology-related stocks, from 3 to 6. This may suggest that the shock had an impact not only on FB, but also on technology-related stocks. Volatility could be detrimental for investors since it increases risks and associated costs; a powerful tool to reduce risks in fluctuating markets is the application of portfolio techniques [20] that rely on correlation to reduce the volatility of investments associated with a set of stocks, i.e., the portfolio. However, if the whole market becomes more correlated, the possibility of systemic failures appears [21] as the market becomes more fragile.

Correlations
To understand whether the CA event has impacted the whole NASDAQ-100, we analyze correlations among the stocks. To this respect, related methodologies are presented in Section 4.2.1. In Figure 2, we show the histograms for stocks' correlations before and after the CA event. We observe that while the qualitative shape and the standard deviation of the probability distribution function remain the same, the whole market becomes more correlated since it experiences a ∼ 50% increase in the average value of cross-correlations. To confirm such observation, we perform a moving average analysis of the cross-correlations. In Figure 3, we show that average cross-correlations are stationary before and after the CA event, shifting from correlations ρ xy ∼ 0.3 before to ρ xy ∼ 0.5 after the CA event.

Correlation Network
To highlight the structure of the stocks' cross-correlations, we represent the correlation matrices as weighted networks. To this respect, related methodologies are presented in Section 4.2.2. In such networks, nodes represent stocks while edges represent significant correlations. In Figure 4, we show the NASDAQ-100 components network subdivided per industry according to a taxonomy, proposed by the NASDAQ, stemming from the Industry Classification Benchmark (ICB) system (Components' list with classification available here: https://www.nasdaq.com/screening/company-list.aspx (July 30, 2018 5:35 pm)).
In Figure 5, we show the correlation network among the NASDAQ-100 components before and after the CA event. We observe that the graph hints some structure of cross-industry correlation among specific assets before the CA scandal, whereas after the events of CA, correlations are denser among all the stocks and no clear cross-industry correlation structure appears.  Associated with a graph, there are several structural quantities, like edge density (measuring the fraction of edges of a graph respect all the possible edges) and clustering coefficient (measuring the local cliquishness [22]. In Figure 6 we show how, similarly to the cross-correlations of Figure 3, edge density and clustering coefficient also have a sharp rise corresponding to the CA event. (a)

Correlation Threshold Sensitivity
The sensitivity of the correlation network to different values of the correlation threshold c has been checked to look at the variation of the Giant Component for different values of c before and after the CA scandal. Figure 7 shows how the Giant Component consistently grows after CA for all the values of c between ∼ 0.35 and ∼ 0.80.

Mutual Information
Mutual Information (MI) is a measure of dependency for nonlinear time series [23]. It has been widely used in bio-informatics to cluster data while also taking into account finite size effect [24]. Generally measured in bits, it is a dimensionless quantity that can be interpreted as the reduction in uncertainty about one random variable given a perfect knowledge of the other. On the one hand, high MI reveals a large reduction in uncertainty. On the other hand, low MI indicates a great uncertainty on a random variable given the knowledge of the other; in particular, zero MI means that two random variables are independent.
Notice that it is possible to have non-zero MI even in presence of zero correlation: In fact, while MI is a distance between two probability distributions, correlation measures linear relationships between two random variables.
In Figure 8, we show how the histogram of the MI values varies across the stocks of the NASDAQ-100 before and after the CA event. After the CA event MI grows on average, i.e., the market becomes more predictable from the knowledge of a limited subset of stocks. Related methodologies are presented in Section 4.2.3. It is also interesting to check the relation between MI and correlation, since it may allow us to spot possible methodological inconsistencies. Figure 9 shows the values of MI versus linear correlation before and after the CA event. MI is non-zero for zero correlations and increases for positive correlations; notice that the points of the scatter plot seem to follow a master curve. This is compatible with findings in similar cases [25]. However, we are not able to fully appreciate the characteristic U-shaped curve, given the absence of strongly negative correlations across the time series of the NASDAQ-100 components. In fact, the assets chosen by the NASDAQ are subject to common risk factors which mitigate the effect of possible sources of negative correlation. Figure 9. Scatter plot coupling Correlation and MI for every pair of stocks before (red pluses) and after (blue crosses) the events of CA. Notice that the all the points seem to follow a master curve.

Discussion
This work should be seen in the light of what has been done concerning the impact of news on financial markets. We have seen how studying a limited set of predictable announcements is frequent in the finance literature, while, in our opinion, taking into account a specific event is less common. In this paper, we have presented a case study concerning the impact of the notorious CA scandal not only on the FB stock, which directly suffered from a serious loss of reputation, but also on the whole market. In particular, we observe a sudden fall of the FB stock price and an increase in its volatility after the shock.
We observe that, in correspondence of the above-mentioned scandal, not only does the volatility of all the stocks increases on average, but both cross-correlation among the stocks and Mutual Information among their time series also increase. Hence, the system starts behaving like a whole, leading to an increase of the systemic risk due to possible cascading failures. In this situation, not only is it difficult to select low-risk investment portfolios, but the number of possible portfolios also decreases: In fact, many investors can unknowingly share the same investment strategy and they can all fail together in case of rare, unfavorable events. In such a situation, it is clear that an increase in cross-correlation leads to an underestimate of the risks and hence to a more fragile stock market.
It is worth highlighting that this case study presents at least two limitations. In the first place, the use of 15-minutes intraday data might be a source of bias [26]. However, we checked the consistency of our results using also daily data and they seem to confirm our findings, despite the poor number of observations within the time span considered. The second limitation is common to all the studies focusing on a single specific event in a quickly adapting environment. Unfortunately, it is not possible to rule out the presence of spurious events. For this reason, we decided to keep the time window as close as possible to the event considered.
Moreover, this case study leaves ample space for further research. First of all, the use of sophisticated econometric models might cast a light on the timing required by FB and by the market to react to the shock caused by the CA scandal. The way in which the increase in correlation and in MI spread across different sectors might also deserve a closer look. Eventually, a broader investigation might be performed considering the common reaction of different assets to a sufficiently large number of data scandals.

Data
For our analysis, we considered the list of equity securities included in the well-known stock market index NASDAQ-100. Our initial dataset contained 103 stock-price high-frequency time series of the NASDAQ-100 components. We removed three time series, namely BKNG, MELI, and FISV, because of issues related to data collection. The resulting dataset contains 100 time series and 779 observations, ranging from 1 March to 12 April 2018, with a 15-minute frequency. The aforementioned time span has been chosen in order to include CA-scandal early events. Data have been collected from Bloomberg.

Methods
We begin transforming stock-price time series into log-return time series. Let p i (t) be the price of a stock i at time t, the log-return, r i (t), of the stock i at time t, is defined as follows [27][28][29]: From the original sample of log-returns, we extract two subsamples consisting of 311 observations each. The first subsample, starting on 1 March 2018, contains the available observations before the break out of CA scandal, whereas the second subsample, starting on 19 March 2018, fully contains the early effects of CA scandal.

Correlations
We proceed computing Pearson correlation pairwise for all the time series in our two subsamples. Pearson correlation, ρ x,y , for a pair of time series, x(t) and y(t), is defined as [27,30]: where indicates the average over a fixed time window, i.e for a given time window [t, t + N] the average of a quantity x is x = N −1 ∑ N i=1 x t+i . We originate two correlation matrices with our log-return time series before and after the CA event. Non-diagonal elements of each matrix may assume values between −1, maximum negative linear correlation, and 1, maximum positive linear correlation, whereas a value equal to 0 signals the absence of any linear correlation. In our case, non-diagonal elements report the correlation coefficient for every pair of stocks. Obviously, diagonal elements report the correlation of each stock with itself, thus their value is always equal to +1.

Correlation Network
A further step is the creation of a weighted network. A weighted network is a triplet G = (V, E, w) where V is the set of vertexes (or nodes), E ⊆ V × V is the set of edges (or links), and the function w associates to each edge e its weight w(e). Given a correlation threshold c, we represent a correlation matrix C as a weighted network by identifying the NASDAQ-100 stocks as the set of nodes and associating to each element |C ij | > c and edge e = (i, j) with weight w(e) = |C ij |. We call such a network associated with the correlation matrix C with threshold c the correlation network G c (C). This slightly differs from [28,29,31] with the intention of also considering large negative correlations.

Mutual Information
Eventually, we also take into account MI. MI between two discrete random variables, X and Y, can be defined as follows [32,33]: I(X; Y) = ∑ y∈Y ∑ x∈X p(x, y) log p(x, y) p(x)p(y) where p(x, y), p(x), and p(y) are respectively the joint and marginal probability distributions of X and Y. In order to compute MI pairwise, we proceed with the discretization of our time series. We opted for a number of bins equal to √ N, i.e., √ 311 ≈ 18 bins for each of the two subsamples.
Author Contributions: All authors contributed equally to the manuscript.

Funding:
A.S., F.Z. and W.Q. acknowledge the support of the CNR-PNR National Project DFM.AD004.027 "Crisis-Lab" and P0000326 project AMOFI (Analysis and Modeling OF social medIa). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessary reflect the views of the funding parties.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.