Binary versus non-binary information in real time series: empirical results and maximum-entropy matrix models

The dynamics of complex systems, from financial markets to the brain, can be monitored in terms of multiple time series of activity of the constituent units, such as stocks or neurons respectively. While the main focus of time series analysis is on the magnitude of temporal increments, a significant piece of information is encoded into the binary projection (i.e. the sign) of such increments. In this paper we provide further evidence of this by showing strong nonlinear relations between binary and non-binary properties of financial time series. These relations are a novel quantification of the fact that extreme price increments occur more often when most stocks move in the same direction. We then introduce an information-theoretic approach to the analysis of the binary signature of single and multiple time series. Through the definition of maximum-entropy ensembles of binary matrices and their mapping to spin models in statistical physics, we quantify the information encoded into the simplest binary properties of real time series and identify the most informative property given a set of measurements. Our formalism is able to accurately replicate, and mathematically characterize, the observed binary/non-binary relations. We also obtain a phase diagram allowing us to identify, based only on the instantaneous aggregate return of a set of multiple time series, a regime where the so-called `market mode' has an optimal interpretation in terms of collective (endogenous) effects, a regime where it is parsimoniously explained by pure noise, and a regime where it can be regarded as a combination of endogenous and exogenous factors. Our approach allows us to connect spin models, simple stochastic processes, and ensembles of time series inferred from partial information.


Introduction
In large systems, the observed dynamics or activity of each unit can be represented by a discrete time series providing a sequence of measurements of the state of that unit. One of the main challenges researchers are faced with is that of extracting meaningful information from the high-dimensional (multiple) time series characterizing all the elements of a complex system [1][2][3][4][5][6][7][8][9]. Traditionally, the main object of time series analysis is the characterization of patterns in the amplitude of the increments of the quantities of interest. Given a signal s i (t) where i denotes one of the N units of the system and t denotes one of the T observed temporal snapshots, the generic increment or 'return' r i (t) can be defined as Previous analyses, mainly in the field of finance, have indeed documented various forms of statistical dependency between the sign and the absolute value of fluctuations, e.g. sign-volume correlations [10,11] and the leverage effect [12][13][14][15]. Other studies have also documented that the binary projections of various financial [16] and neural [17] time series exhibit non-trivial dynamical features that resemble those of the original data. All these results suggest that binary projections indeed retain a non-trivial piece of information about the original time series, and call for a deeper analysis of the problem. Being binary, the sign of the increments is much more robust to noise than the increments themselves. Moreover, it is scale-invariant (i.e. independent of the chosen unit of increments) and does not depend on whether the original data have been preliminarily rescaled or logtransformed (as usually done, e.g., for financial time series). Binary time series can also be analyzed with the aid of much simpler mathematical models than required by non-binary data (several examples of such models will be provided in this paper). Finally, as we show later on, in multiple financial time series the total binary increment of a given cross-section measures the instantaneous level of synchronization (i.e. the number of stocks moving in the same direction) of the market, while the total non-binary increment does not carry this piece of information. For all the above reasons, it is important to further investigate whether the full 'weighted' or 'valued' information can, in some circumstances, be somehow mapped to the binary one, thus providing a robust, highly simplified, more easily modeled, and informative representation of the system. Motivated by the above considerations, in this paper we further study, both empirically and theoretically, the relationship between weighted time series and their binary projections. We first provide robust empirical evidence of novel relationships between binary and non-binary properties of real financial time series. To this end, we use the daily closing prices of all stocks of three markets (S&P500, FTSE100 and NIKKEI225) over the period 2001-2011. We show that the average daily increment and average daily coupling of an empirical set of multiple time series are strongly and nonlinearly related to the corresponding average increment of the binary projections of the same time series. These empirical relations quantify in a novel way the strong correlations existing between the increments of individual stocks and the overall level of synchronization among all stocks in the market.
Building on this evidence, we then introduce a formalism to analytically characterize random ensembles of single and multiple time series with desired constraints. Specifically, we follow Jaynes' interpretation and re-derivation of statistical physics as an inference problem from partial macroscopic information to the unobservable microscopic configuration [18,19]. We define statistical ensembles of matrices that maximize Shannonʼs entropy [20], subject to a set of desired constraints. This maximum-entropy approach is widely used in many areas, from neuroscience [21] to social network analysis [22] (and more recently network science in general [23]), where it is known under the name of exponential random graph (ERG) formalism. In the case of interest here, we introduce ensembles of maximum-entropy binary matrices that represent projections of single and multiple binary time series, subject to a set of desired constraints defined as simple empirical measurements. We discuss the main differences between our matrix ensembles and other techniques in time series analysis, including other ensembles of random matrices encountered in random matrix theory [24][25][26][27][28].
Our approach leads to a family of analytically solved null models that allow us to quantify the amount of information encoded in the chosen constraints, i.e. the selected observed properties of the binary projections of real time series. Different choices of the constraints lead to different stochastic processes, a result that allows us to relate known stochastic processes to the corresponding 'target' empirical properties defining the ensemble of time series spanned by the process itself. After applying the approach to the financial time series in our analysis, we compare the informativeness of various measured properties and show that different properties are more relevant for different time series and temporal windows. We also identify distinct regimes in the behaviour of multiple stocks and give the most likely explanation (endogenous, exogenous, or mixed) for the observed level of coordination or 'market mode', given the measured binary return at a given point in time. Finally, and most importantly, we show that our approach is able to reproduce and mathematically characterize the observed nonlinear relationships between binary and non-binary properties of real time series.
The rest of the paper is organized as follows. In section 2 we describe the data and provide empirical evidence of the relationships that motivate our work. In section 3 we introduce our theoretical formalism in its general form. In section 4 we apply the formalism to single time series, while in section 5 we apply it to single cross-sections (temporal snapshots) of multiple time series. Finally, in section 6 we consider our method in its full extent and apply it to entire spans of multiple time series, for different financial markets around the globe. We end with our conclusions in section 7.

Data
We use daily closing prices, for the 10 year period ranging from 24 October 2001 to 18 October 2011, of all stocks from the indices S&P500, FTSE100 and NIKKEI225. For each index, we restrict our sample to the maximal group of stocks that are traded continuously throughout the selected period. This results in 445 stocks for the S&P500, 78 stocks for the FTSE100 and 193 stocks for the NIKKEI225.
We take logarithms of daily closing prices to obtain time series of log-prices that represent our original 'signal' s i (t), where i labels stocks and t labels days in the sample. Correspondingly, we construct time series of log-returns where each entry represents the increment r i (t) as defined in equation (1). Finally, we take the sign x i (t) of each log-return r i (t) to obtain an additional, binarized set of time series as in equation (2). We will refer to the binarized time series as the binary projection of the original time series. In figure 1 we show a simple example of a weighted time series, along with the corresponding binary projection. The (multiple) time series of r i (t) and x i (t) are the main objects of our analysis throughout the paper. Note that, while the use of log-returns rather than simple returns (i.e. price differences) in finance is an important step that allows the removal of overall trend effects over long time spans [5], the binary signature is actually independent of whether the original prices have been logarithmically transformed.
The main reason for choosing the daily frequency is to achieve an optimal level of structural compatibility between the data and the models we introduce later. As we discuss in detail in section 3, our models are binary, i.e. they only allow the two values ±1 depending on whether the increment of the original time series is positive or negative. An increment of 0 is not admitted in the models: consistently, we chose a frequency for which zero increments are extremely rare in the data. In financial markets, this is the case for daily (or lower) frequency. Indeed, a zero return value occurs in less than 0.2% of the cases in our daily data (when this happens, we randomly switch the corresponding binary increment to either +1 or −1 with equal probability). Higher-frequency data feature an increasing percentage of zero returns, a property that calls for an extension of the models considered here.
It should be noted that other types of binary time series, different from the ±1 type considered here, can also be defined. Most notably, 0 1 binary time series can indicate the occurrence of an event in a time period, i.e. whether the event happened (1) or not (0). Financial examples include time series of recession indicators [29,30] or of 'switching points' in stock returns [31]. For such 0 1 binary time series, correlations may not be very informative when measuring a dependence between the dichotomous variables. To confront this gap, in recent years new methods have been introduced, like the auto-persistence function and autopersistence graph [29]. In these methods, the dependence structure among the observations is described in terms of conditional probabilities, rather than correlations. Although throughout this paper we will be entirely focusing on ±1 binary time series that naturally descend from the original signed time series of fluctuations, it is interesting to notice that our approach can be extended, with slight modifications, to 0 1 time series as well. To this end, one needs to reexpress all quantities in terms of a 0 1 binary variable ≡ + y x ( 1) 2, where x is our ±1 binary variable, and adapt our approach accordingly.

Nonlinear binary/non-binary relationships
We now come to the main empirical findings that motivate our paper. For each index and for each day t in the sample, we first calculate the average (over all stocks) weighted return, that we denote as r t { ( )} i and define as Note that the above expression does not depend on the particular stock i, but it does depend on time t. Our unconventional choice of the symbol { · } to denote an average over stocks is to avoid confusion with temporal averages, which will be denoted by the more usual bar ( · ) later in the paper. Similarly, we calculate the corresponding average binary return x t for all days of various 1 year intervals and for the three indices separately. We find a strong nonlinear dependency between the two quantities. Note that the average binary return is bound between −1 and +1 by construction, but the average weighted return is unbounded from both sides. While there are in principle infinite Figure 2. Nonlinear relationship between the average daily increment (weighted return) and the average daily sign (binary return) over all stocks in the FTSE100 (left), S&P500 (center) and NIKKEI225 (right) in various years (2003, 2007, and 2004, respectively). Here each point corresponds to one day in the time interval of 250 trading days (approximately one year). The red line represents the best fit with the function = y a x · artanh , whose use is theoretically justified later in section 6. values of r t { ( )} i that are consistent with the same value of x t { ( )} i , we observe a tight relationships between the two quantities. This relationship can be fitted by a one-parameter curve of the form (the theoretical justification for this functional form will be given in section 6), where a is in general different for different years and different indices. Still, as we show later, for a given year and market the average weighted return of any day t is to a large extent predictable (out of sample) from the average binary return of the same day, once a is known (for instance by fitting the above curve to the data for a past time window). In section 6 we will also show that the nonlinear character of the observed relations is a genuine signature of correlation in the data, as an uncorrelated null model shows a completely linear behaviour. There is another empirical relationship, involving a higher-order quantity. For each index and for each day t in the sample, we calculated what we will call the average 'coupling' over the − N N ( 1) 2 distinct pairs of stocks: , for the same data as in figure 2. Again, we find a strong nonlinear dependency, where for a given value of the average binary return of day t there is a typical value of the average coupling among all stocks in the same day. The relationship can be fitted by a one-parameter curve that diverges at = ± x { } 1 i . As we show in section 6, an uncorrelated null model would yield a different, parabolic curve with no divergences. Again, this means that the empirical trend is due to genuine correlations, whose nature will be clarified later on in the paper.
There are even more examples of dependencies that we can find between binary and nonbinary properties in the data. However, in one way or another all these relationships, including that shown in figure 3, ultimately derive from equation (5). For this reason, we refrain from showing redundant results and focus on the empirical findings discussed so far. Nonlinear relationship between the average daily coupling (weighted coupling) and the average daily sign (binary return) over all stocks in the FTSE100 (left), S&P500 (center) and NIKKEI225 (right) in various years (2003, 2007, and 2004, respectively). Here each point corresponds to one day in the time interval of 250 trading days (approximately one year). The red line represents the best fit with the function = y b x · (artanh ) 2 , whose use is theoretically justified later in section 6.
The above analysis indicates that the binary signature of financial time series contains relevant information about the original data. While the binary signature is a priori a many-toone projection involving a significant information loss, we empirically find that there are properties (namely the average return and average coupling) for which the projection is virtually a one-to-one 'quasi-stationary' transformation (on appropriate time scales, as we show in section 6), allowing the reconstruction of the corresponding original, weighted properties to a great extent. Rather than exploring the practical aspects of this possibility of reconstruction of the original signal from its binary projection, in this paper we are interested in understanding the origin of this behaviour and providing a simple data-driven model of it. This will be ultimately achieved in section 6, where we also show that the binary/non-binary relations we have documented are a novel quantification of the fact that extreme price increments occur more often when most stocks move in the same direction. This is an important type of correlation between the magnitude of log-returns of individual time series and the level of synchronization (common sign) of the increments of all stocks in the market.

Maximum-entropy matrix (MEM) ensembles
Having established that the binary projections of real time series contain non-trivial information, in the rest of the paper we introduce a theory of binary time series aimed, among other things, at reproducing the observed nonlinear relationships showed in figures 2 and 3. In our approach, we regard a synchronous set of binary time series as a ±1 matrix and we introduce an ensemble of such matrices via the maximization of Shannonʼs entropy, subject to the constraint that some specified properties of the ensemble match their observed values. An analogous approach is widely used, e.g., in network analysis and known under the name ERG [23]. Moreover, we provide an analytical maximum-likelihood method to find the optimal values of the paramaters governing the ensembles, which is again similar in spirit to a method that has been recently introduced for networks [32][33][34]. Finally, we describe Akaikeʼs information criterion (AIC) [35], which we will use to rank and compare the performance of different null models when fitted to the same data.
Being entropy-based, our approach automatically allows us to measure the amount of information encoded into the observed properties chosen as constraints, i.e. how much information is gained about the original (set of) time series once those properties are measured. It also allows us to identify, given a set of measured properties, which ones are more informative and which ones can be discarded, as we show on specific financial examples. Our framework turns out to reproduce the observed nonlinear relationships very well, thus providing a simple mathematical explanation and functional form for the plots shown in the previous section. Moreover, we are able to identify, as a function of the binary return only, distinct regimes in the collective behavior of stocks, namely a 'coordinated' regime dominated by market-wide interactions, an 'uncoordinated' regime dominated by stock-specific noise and an 'intermediate' regime where both market-wide and stock-specific information is relevant.
We incidentally note that, despite the available variety of refined and advanced techniques in time series analysis [36], how one can quantify (in the sense of statistical ensembles) how much information is actually encoded into any given, measurable property of a time series is still not fully understood. While most studies, starting from the celebrated work by Kolmogorov about the algorithmic complexity of sequences of symbols [37], have addressed the quantification of the information content of a single time series, much less is known about the information encoded in the measured value of a given time series property (which, necessarily, involves the idea of an entire ensemble of time series consistent with the measured value itself). Our approach can provide an answer to such a question, by associating an absolute level of uncertainty (entropy) to each observable of an empirical (set of) time series. In relative terms, this also allows us to compare the information content of different properties of a time series, thereby indicating which measured property is the most informative about the original time series.
As a final consideration, it is worth mentioning that the MEM ensembles that we introduce are clearly related to (and, depending on the specification, potentially overlapping with) some ensembles that are well studied by random matrix theory [38][39][40][41][42][43]. However, our approach is different since we generate ensembles of matrices whose probability distributions are determined by the kind of partial information (empirically measured constraint) about the real system. In this approach the maximization of Shannonʼs entropy, given some real-world available information, yields the least biased probability distribution (over the space of possible matrices) consistent with the data. This formalism allows us to relate the probabilistic structure of each matrix ensemble with the choice of the original observed property, or constraint. Similarly, since our matrices represent (multiple) time series, we are able to connect the various ensembles to simple stochastic processes induced by the associated matrix probabilities and, again, to the chosen empirical property specifying the ensembles themselves.

Exponential random matrices (ERMs)
We first analytically characterize the properties of families of randomized matrices. More generally, we introduce a matrix ensemble that maximizes Shannonʼs entropy, while enforcing a set of observed constraints (selected time series properties). This procedure is analogous to e.g., that leading to the definition of ERGs in network theory [23]. However, we will modify it to accommodate ±1 matrices, as opposed to 0 1 or non-negative matrices that describe binary and weighted networks, respectively. The resulting ensemble can thus be denoted as the MEM ensemble or equivalently the ERMs model.
Let us consider the ensemble of all ±1 matrices with dimensions N × T. Each such matrix can represent N synchronous time series, all of duration T (for instance, if applied to a set of multiple financial time series, N refers to the number of stocks and T to the number of time steps). Let X denote a generic matrix in the ensemble, and x i (t) its entry ( ). Let X * be the particular real matrix that we observe. In other words, our ensemble is composed of all possible matrices X of the same type as X * , and includes X * itself. For any datadependent property R, we will consider the value R X ( ) obtained when R is measured on the particular matrix X. For each matrix X in the ensemble, we will assign an occurrence probability P X ( ). The expectation value (ensemble average) of a property R can be expressed as X where the sum runs over all matrices in the ensemble. At this point, we introduce a set of constraints denoted by the vector ⃗ C. The constraints are meant to ensure that a given set of observed properties ⃗ C X ( * ) in the real matrix X * is reproduced by the ensemble itself. In our method we will enforce 'soft' constraints by requiring that their expectation value 〈 ⃗ 〉 C equals the observed one. The resulting ensemble is a canonical one where each matrix X is assigned a probability P X ( ) that maximizes Shannonʼs entropy X that we enforce in order to reproduce the desired set of observed quantities. The solution to the above constrained maximization problem is standard (see for instance [23] for a recent derivation in the context of networks). We first introduce the Lagrange multipliers α and θ ⃗ , enforcing equations (9) and (10) respectively, and then require that the functional derivative of Shannonʼs entropy (plus the constraining terms) vanishes:

This yields
which is the normalizing constant for the probability. Consistently, we can rewrite equation (7) more explicitly as a function of θ ⃗ : X where 〈 〉 θ ⃗ · indicates that the ensemble average is evaluated at the particular parameter value θ ⃗ . Equations (12)- (14) define the MEM or ERM model. Specifically, the model yields the probability distribution over a specified ensemble of matrices, which maximizes the entropy under a set of generic constraints. The guiding principle is that the probability distribution (over microscopic states) which have maximum entropy, subject to observed (macroscopic) properties, provides the most unbiased representation of our knowledge of the state of a system [19]. To put it in a more physical frame, this is analogous to the Gibbs-Boltzmann distribution over the microstates of a large system at a well defined temperature, given the thermodynamic (macroscopic) observables such as the total energy.

Maximum-likelihood parameter estimation
The above derivation shows that the expectation value of any property of the ensemble depends functionally on the specific enforced constraints ⃗ C through the resulting structure of θ ⃗ P X ( | ). Of course, it also depends numerically on the measured values ⃗ C X ( * ) of the constraints themselves, through the particular parameter value (that we denote by θ ⃗ * ) required in order to enforce that the expected and observed values of ⃗ C match: * We now show that the value θ ⃗ * that satisfies equation (16) concides with the value that maximizes the likelihood to generate the empirical data, as in the corresponding maximum likelihood (ML) approach to network ensembles [32,44]. We start by writing the log-likelihood function of an observed matrix X * generated by the parameters θ ⃗ : We then look for the particular value θ ⃗ * that maximizes λ θ ⃗ ( ), i.e.
(it is easy to check that the higher-order derivative confirms that θ ⃗ * is a point of maximum). This leads to * the solution for that yields the ML condition which coincides with equation (16). Thus the likelihood of the real matrix X * is maximized by the specific parameter choice such that the ensemble average of each constraint equals its empirical value measured on X * , automatically ensuring that the desired constraints are met.

Model selection
We finally show how we can use AIC to rank the performance of different models, i.e. different choices of the constraints, in reproducing the same data. The AIC is an information-theoretic measure of the relative goodness of fit of a model, as compared to a set of alternative models all used to explain the same data [35]. It offers a relative measure of the information lost when the given model is used to describe reality. The power of AIC (and other similar criteria [45]) lies in the possibility to rank a set of models in terms of their achieved trade-off between accuracy (good fit to the data) and parsimony (low number of free parameters) [45]. In general, for the kth model in a set of selected models, AIC is defined as where n k is the number of free parameters in the kth model and λ k * is the maximized loglikelihood of the data under the same model. The above expression effectively discounts the number n k of parameters (complexity) from the maximized likelihood λ k * (accuracy). The model with the lowest value of AIC k (let us denote this value by AIC min ) is the 'best' model in the considered set, achieving the optimal trade-off [45].
In the ERM/MEM family of models we have introduced, a model is uniquely specified by the choice of the constraints ⃗ C. Given a N × T data matrix X * and a set ⃗ of m possible choices of constraints, each of the resulting m models has an AIC value In order to understand whether models with values of AIC larger than but close to AIC min are still competitive, it is customary to define the so-called 'AIC weights' which provide a normalized strength of evidence for a model [45]. The AIC weight w k represents the probability that the kth model is the best one among the m selected models. For instance, an AIC weight of w k = 0.75 indicates that, given the data, model k has a 75% chance of being the best model among the m candidate ones. If two or more models have comparable AIC weights (e.g. w 1 = 0.6, w 2 = 0.4 or w 1 = 0.35, w 2 = 0.25, w 3 = 0.4), then there is no evidence that the model with the highest AIC weight (lowest AIC value) is clearly outperforming the other ones. All the models with comparable weights should be considered as competing alternatives, in principle leading to the problem of multi-model inference [45].

Single time series
In this section we consider the first family of specifications of our general approach outlined in section 3. We focus on the simple case of single time series (N = 1), where the ensemble of N × T matrices reduces to an ensemble of × T 1 matrices, or equivalently of T-dimensional row vectors. Each such vector will still be denoted by X. We assume long time series, i.e. ≫ T 1. This first specification of our abstract formalism is not meant to provide realistic models for the evolution of the binary increments of real financial time series. Rather, it allows us to make different sorts of considerations. On one hand, it allows us to introduce our formalism using simpler examples first, establishing the basis for the more general cases (leading to the main results of the paper) that will be introduced later. On the other hand, it emphasizes that different and well known (one-dimensional) stochastic processes are found as particular examples of maximum-entropy ensembles defined by specific constraints that are otherwise obscure. Identifying these 'driving constraints' underlying common stochastic processes will help us interpret such processes in the light of the empirical properties being reproduced. Finally, our approach allows us to identify, given the data and given a set of simple properties, which of these properties is encoding the largest amount of information about the original binary signature.
Let X denote a single time series with entries , each representing a temporal increment. We will denote the average increment (first moment) as Note that the second moment is always We also define the τ-delayed product (with where we have introduced periodic boundary conditions: The above periodicity condition in inessential, since we could have used a definition avoiding its introduction, but it makes some expressions simpler in what follows. Periodicity implies that the normalized (between −1 and +1) autocorrelation function (with delay τ) can be defined as Since a (±1) binary time series can also be regarded as a chain of classical spins pointing either up or down, it is straightforward to consider simple, analytically solved spin models as the starting point, since these models are defined in terms of a 'physical' Hamiltonian that has precisely the same structure of our 'information-theoretic' Hamiltonian defined in equation (13).
In what follows, we introduce various model specifications. For each model, we introduce the constraints that we enforce and the resulting Hamiltonian as described in section 3.1. Different constraints correspond to different spin models and lead to different stochastic processes. This is pictorially illustrated in figure 4. The free parameters conjugated to the constraints will be fitted according to the ML principle described in section 3.2. Different models will be ranked according to the AIC weights introduced in section 3.3.

Uniform random walk
The most trivial model is one where we enforce no constraint, i.e. there is no free parameter and the Hamiltonian is Physically, the above Hamiltonian describes a gas of T non-interacting 'spins' in a vacuum, i.e. in absence of an external magnetic field. This model is discussed in the appendix. The probability of the occurrence of a time series X is completely uniform over the ensemble of all binary time series of length T. All the T elements of X are mutually independent and identically distributed. This results in a completely uniform random walk with zero expected value for each increment: . In each model we enforce different constraints that imply different spin models and different stochastic processes. Given the same time series, we consider three possible models. (A) We enforce no constraint, which translates into a chain of non-interacting spins without external field (uniform random walk). (B) We enforce the total temporal increment, which translates into a chain of non-interacting spins with external field (biased random walk). (C) We enforce both the total increment and the one-lagged autocorrelation, which translates into a chain of spins with first-neighbour interactions and external field (Markov process).
While the (ensemble) variance of each increment equals 2 2 This trivial model generates a symmetric random walk. Since the expected return is zero, and the uncertainty is maximal, the variance is also maximal (for a ±1 binary random variable). Financially, the model assumes that the stock fluctuates randomly, with no memory, and with no overall 'price drift'. This is the most basic model of price dynamics that has been considered in the financial literature since the pioneering work of Bachelier [1], here adapted to the case of binary time series.
The model can be used as a basic benchmark for checking the performance of our other models. This comparison will be studied in section 4.4. Since here the likelihood is independent of any parameter, the AIC of the model can be calculated using equation (22) where the probability is given by equation (A.3) (see appendix) and the number of parameters is n k = 0.

Biased random walk
We now consider the total increment as the simplest non-trivial (one-dimensional) constraint: 1

This leads to the Hamiltonian
which coincides with the physical Hamiltonian for a gas of T non-interacting 'spins' in a common external 'magnetic field' θ − . As we show in the appendix, this model generates a biased random walk where the probability The expected return is the hyperbolic tangent while the variance is 2 Financially, this model still assumes no memory in the fluctuations of a given stock, but it introduces a 'price drift' in terms of a non-zero expected return. The maximum likelihood condition (16), fixing the value θ * of the parameter θ given a real time series X * , leads to The maximized likelihood for the model is which, using equation (22) with n k = 1, can be used to measure the AIC (see section 3.3) of the model, based on the observed data. This will be done in section 4.4.

One-lagged model
Let us now explore a more complex model of collective behavior. The models considered so far were non-interacting, i.e. each return in the time series was independent of the previous outcomes. Now we consider a model where, besides the constraint on the total increment specified in equation (33), we enforce an additional constraint on the time-delayed (lagged) is defined in equation (27) with τ = 1. Financially, this amounts to enforcing the average return and the average one-step temporal autocorrelation of the time series. In order words, besides a price drift, we also introduce a short-term memory.
The resulting two-dimensional constraint can be written as where we consider a periodicity condition as in equation (28) . Note that, when X is a real binary time series of length T, this condition can be always enforced by adding one last (fictious) timestep + T 1 and a corresponding increment ( 1) chosen equal to x (1). For long time series (large T), the effects induced by this addition are negligible.
The above Hamiltonian coincides with that for the one-dimensional Ising model with periodic boundary conditions [46], which is a model of interacting spins under the influence of an external 'magnetic' field I. The model is analytically solvable (see the appendix for the complete derivation), which allows us to apply it to real time series in our formalism. In our setting, each time step t is seen as a site in an ordered chain of length T, and each value = ± x t ( ) 1 is seen as the value of a spin sitting at that site. 'First-neighbour interactions' along the chain of spins are here interpreted as one-lagged memory effects. As a result of these interactions, the model generates time series according to a Markov process where the probability of an increment The resulting expected value of the normalized autocorrelation defined in equation (47) is simply The above expressions allow us to calculate all the relevant expected properties of the time series generated by the model, once the parameters I and K are set to the values I * and K * maximizing the likelihood P I K X ( * | , ) of the observed time series X * . These values are the solutions of the coupled equations where M X ( * ) 1 and B X ( * ) 1 are the empirical values measured on the real data X * . The maximized likelihood of the model can be calculated as P I K X ( * | * , * ), where P I K X ( | , ) is given by equation (A.32) in the appendix. From the maximized likelihood, the AIC can be easily obtained using equation (22) with n k = 2.
Note that the values I * and K * are such that the first point of the expected autocorrelation function, 〈 〉 A I K 1 *, * , is necessarily equal to the observed value A X ( * ) 1 . Based on this first value alone, the model will provide the full expected autocorrelation 〈 〉 τ A I K *, * as follows: *, * will be an oscillating function (modulated by a decreasing exponential), and will take negative values when τ is odd and positive values when τ is even.
In figure 5 we compare the measured autocorrelation, equation (29), with the predicted one, equation (50), for three different S&P500 stocks (USB, Qcom, and MJN) over a period of 800 trading days (approximately 3.5 years). As expected, we see that the first point (one-lagged autocorrelation) is always reproduced exactly. We also confirm that, depending on the sign of the first point, the predicted trend is either exponentially decreasing (e.g. for the USB stock on the left) or oscillating (e.g. the Qcom and MJN stocks). The dashed lines indicate the noise level, which we arbitrarily fixed at two standard deviations of the fisher-transformed 1 autocorrelation. The behaviour of the USB and Qcom stocks is representative of the vast majority of stocks, with the autocorrelation within the noise level already at the minimum delay (τ = 1). This is in good agreement with what we know about financial time series (no dependencies for daily frequency, the typical time scale for autocorrelation being of the order of minutes). We also found that the first point, the autocorrelation between two successive days, is small but negative for most stocks in our data set. In the rightmost panel (MJN stock) we observe a rare dynamic, where the one-lagged autocorrelation is breaching the noise level and then rapidly oscillates to zero.
As clear from the figure, our model reproduces well the observed autocorrelation in all these different cases, and gives a single mathematical explanation for both the exponentially decaying (from positive one-lagged autocorrelation) and the oscillating (from negative onelagged autocorrelation) behaviour. Moreover, the generic feature of the one-dimensional Ising model, i.e. the absence of a phase transition characterized by a diverging length (here, time) scale [46], explains why in real-world time series the memory is always found to be shortranged.

Comparing the three models on empirical financial time series
As we illustrated in section 3.3 in the general case, once we have more than one model for the same data X * , we can use the AIC weights to rank all models in terms of the achieved trade-off between accuray (good fit to the data) and parsimony (small number of parameters). The AIC , is distributed around zero, but in a non-Gaussian way. However, the quantity ϕ ρ ≡ artanh( ) x y x y , , , known as the Fisher tranformation, is normally distributed around zero, with standard deviation , representing a 95% confidence interval for ϕ x y , , can then be mapped back to the interval σ ρ σ − < <+ tanh (2 ) tanh (2 ) x y , to obtain a 95% confidence interval for ρ x y , around zero.
weight w k of a specific model k represents the probability that the model is the 'best' one among the candidate models. We applied this procedure to the three models discussed so far (uniform random walk, biased random walk, one-lagged model). As an example, in figure 6 we show the values of the AIC weights for three different S&P500 stocks. We can see that the performance of the models is wildly fluctuating and different across stocks. This suggests that the informativeness of the measured properties is dependent on different factors, which are not entirely revealed to us. However, it is clear that in all cases the time horizon T plays a key role in the performance of the models. This means that the outcome depends on how many time steps are included in the analysis. For instance, we see that in some cases (Citigroup Inc. stock) the small T regime is oscillatory, while the large T regime appears to set a preference for a definite model. In other cases (United Health Group), the three models alternate over quite long periods of time. Most likely, this very irregular behaviour is due to the strong non-stationarity of financial markets: extending the analysis over longer time horizons does not necessarily improve the statistics, because for large T the underlying price (and return) distributions change in an uncontrolled way.
We stress again that the AIC weight indicates which property, among the constraints defining all models, can better characterize the stock, given the observed data. In other words, it highlights the measured property that is most informative about the original data. Despite the fact that the models considered so far are extremely simplified (and are by no means intended to be accurate models of financial time series), this approach can always identify, in relative terms, the most useful empirical quantity characterizing an observed time series.

Single cross-sections of multiple time series
In the previous section we considered models for single time series, where N = 1 and T is large. Here we consider, as a second specification of our general formalism, the somewhat 'opposite' case of single cross-sections of N multiple time series, which represent a daily snapshot of the market dynamics. For clarity, figure 7 portrays a single cross-section of a set of multiple time series. In this case, T = 1 and we assume ≫ N 1. So the matrix X has dimensions × N 1, i.e. it is an N-dimensional column vector. The entries of a cross-section X will be denoted by , each representing the daily increment of a different asset. Using again the symbol { · } to denote an average over stocks (as in section 2.2), we now define the average increment (first moment) of X as and the second moment as Therefore the sample variance is We also define the total 'coupling' between stocks (for a specific cross-section X) as where now, as in equation (6), { · } denotes an average over all pairs of stocks.
In what follows, we will consider various models for single cross-sections. The main difference with respect to the models of single time series considered in section 4 is that the interaction between time steps for a given stock is now replaced by the interaction between different stocks for a given time step. As is well known, in real financial markets the interactions among stocks (as measured, e.g., via cross-correlations) are much stronger than inter-temporal autocorrelations. This makes the cross-sectional properties significantly different from those of the dynamics of single time series, once inter-stock interactions are enforced in the model. Yet, in simple models without interaction, we recover similar expected properties.
In section 5.4, we will compare the performance of this trivial benchmark to that of the other models we are about to introduce. To this end, the AIC value can be calculated from equation (22) choosing n k = 0 and using the (constant) likelihood given by equation (B.3) in the appendix.

Biased random walk
In this model, which is analogous to that defined in section 4.2, the constraint is chosen as the total daily increment of the cross-section X: is defined by equation (51). The Hamiltonian is then Similarly to its counterpart for single time series, this is a model of non-interacting spins under the effect of a common external field, and leads to a biased random walk (see the appendix). The financial interpretation is however different: in this model, all stocks are assumed to fluctuate (again, in an 'ensemble' sense) under the effect of a common market-wide factor, but are conditionally independent of each other, given the market-wide factor itself. In the econophysics literature, the overall tendency of all stocks to move together is generally referred to as the 'market mode' [2]. When applied to the data, this extremely simple model interprets the observed market mode as the consequence of an external factor (e.g. news), and not of direct interactions among stocks.
The probability i and the variance is The maximum likelihood condition (16), fixing the value θ * of the parameter θ given a real cross-section X * , leads to * is the measured average increment of the observed cross-section X * . We will apply this model to real financial data in sections 5.4 and 6. The AIC of the model is given by equation (22) where n k = 1 and where the maximized likelihood is given by θ P X ( * | * ), with θ P X ( | ) given by equation (B.10) (see the appendix).

Mean field model
We now consider a more complex model, with interactions among all stocks, which is suitable for financial cross-sections. Besides the constraint on the total increment, we enforce an additional constraint on the average coupling between stocks. The resulting two-dimensional constraint can be written as .
Like the one-lagged model for single time series (see section 4.3), this model is formally analogous to an Ising model of interacting spins under the influence of an external 'magnetic' field (here denoted by h). However, the big difference is that, whereas in the one-lagged model each increment x(t) interacts only with the next temporal increment + x t ( 1) of the same stock, here each increment x i interacts with all the other increments x j of the same cross-section X, i.e. with all other stocks in the market. As a model of spin systems, the above model is generally known as the mean-field Ising model [46]. In the appendix we provide the analytical solution of the model, adapted to our setting.
In the financial setting, this model allows us to separately consider the effects of the external field, i.e. a common factor affecting all stocks in the market, from those of the average interaction among all stocks. This market-wide interaction can also cause all stocks to correlate, but has the different interpretation of a collective effect, i.e. the tendency of stock increments to 'align' with each other as a result of direct interactions, rather than of a common influence. This is a sort of 'herd effect' at the coarse-grained level of attractive ( > J 0) inter-stock interactions. So, the model can generate the 'market mode' either as the result of a common external influence such as news (in which case all stocks are still conditionally independent given the common factor), or as a collective effect due to mutual interactions (in which case all stocks are conditionally dependent given the common factor).
While the model can in principle simulate synthetic time series under a combination of the above two effects by varying the two parameters h and J independently, a problem arises when it is fitted to the data. The mathematical root of the problem is the well known fact that H h J X ( , , ) can be rewritten as a linear combination of M X ( ) 1 and M X ( ) 1 2 . As we show in the appendix, this implies that, when the maximum likelihood principle is used to fit the model to the data X * , the variance of M X ( ) 1 becomes zero. In other words, the model degenerates to one where M X ( ) 1 is no longer a random variable. This also implies that the two equations fixing the values of the parameters J * and h * become identical (see the appendix). Therefore it is no longer possible to uniquely fix the values of both parameters, and the problem is overconstrained. For this reason, we need to eliminate one parameter and consider the model only in the two extreme cases h = 0 and J = 0. These two cases can be treated separately.
The case J = 0 coincides with the biased random walk model already considered in section 5. In what follows, when using the 'mean-field' model, we will always refer to the parameter specification defined by (68). The other specification, equation (67), will instead still be denoted as the 'biased random walk' model.
In   It should be noted that the case is never practically encountered in reality, since the empirical x { } i * can be abritrarily small, but is generally not really zero. While this 'protects' the model from the indeterminacy discussed above, it raises another problem of arbitrariness, which can however be solved very effectively using the information-theoretic criteria that we have introduced in section 3.3. The problem is that the mean-field model will always interpret even the tiniest empirical deviations from = x { } 0 i * as the result of direct interactions among stocks, and attach a value > J * 0 to this interpretation. This will also apply to, e.g., most realizations of a purely uniform random walk: even if for such a model one knows that the theoretical expected return is zero, most realizations will be such that x { } i * is small but nonzero. So the only phase of the mean-field model that can be explored is the 'magnetized' phase dominated by collective effects. This implies that even a pure effect of noise will be interpreted as the presence of interactions. However, this problem will be solved in the next section, where we show that an information-theoretic comparison between the mean-field model, the uniform random walk, and the biased random walk is able to discriminate the most parsimonious model, thus allowing us to trust the mean-field model only when x { } i * is distant enough from zero.

Comparing the three models on empirical financial cross-sections
We can now combine the three models together and use the AIC weights (see section 3.3) to determine which model achieves the optimal trade-off between accuracy and parsimony. This will immediately provide us with an indication of whether the observed market mode, as reflected in the empirical aggregate increment x { } i * , should be interpreted, e.g., as a common exogenous factor, as a collective endogeneous effect, or even only as the sheer outcome of chance.
The fact that the likelihoods of the biased random walk and the mean-field model depend only on x { } i * and N, plus the fact that the likelihood of the uniform random walk is constant, allows us to obtain the AIC values for the three models as functions of x { } i * and N only. In figure 9 we show the calculated AIC weights of the three models as a function of the observed value x { } i * , for N = 428 S&P500 stocks. Each point represents a different cross-section, i.e. a different day of trade, for a total of 100 randomly sampled days. It is important to note that the empirical value of the average increment only determines which point(s) of the curves are actually visited, but the curves themselves are universal.
The figure reveals a remarkable fact, namely the presence of three distinct regimes in the behavior of the group of stocks. For ⩽ ≲ x 0 |{ }| 0.2 i * , we find that the best performing model is the uniform random walk, which displays an AIC weight practically equal to one (indicating that the model is almost surely the best one among the three models considered, see section 3.3). This means that, in this 'noisy' regime, the most parsimonious explanation of the market mode, as reflected in the measured value of x { } i * , is that of a pure outcome of chance.
, we find that the uniform random walk is almost surely not the best model, while the biased random walk and mean field models are competing. We observe an almost equal performance of the two models for ≈ x |{ }| 0.2 i * , and an increasing preference for the mean field model as x |{ }| i * increases towards 0.5. Despite this preference, we cannot reject the mean field model, meaning that in this 'mixed' regime the most likely explanation for the market mode is a combination of exogenous and endogenous effects.
Finally, for , the mean field model achieves practically unit probability to be the best model. In this 'endogenous' regime, the most likely explanation for the market model is uniquely in terms of a collective effect of direct influence among stocks.
We can summarize the above findings as follows: are better explained in terms of collective effects might appear intuitive, the possibility to quantitatively identify the value ≈ x |{ }| 0.5 i * above which this intuition is fully supported by statistical evidence is a nonobvious output of the above approach. The same consideration applies to the identification of the other two regimes, and of a mixed phase where there is not enough statistical evidence in favour of a single interpretation of the market mode. Moreover, the fact that the mean field model starts being statistically significant only for ≳ x |{ }| 0. , the best model is actually the uniform random walk, which effectively corresponds to = J * 0. This is a highly non-trivial result.

Ensembles of matrices of multiple time series
In this section, as our third and final specification of the abstract formalism introduced in section 3, we extend the previous results to the general case where the observed data is a full N × T matrix X * representing a set of multiple binary time series for N stocks, each extending over T timesteps. We recall that the entries of a generic such matrix X are denoted by x i (t), where i labels the stock and t labels the time step. We assume that N and T are both large, i.e. ≫ N 1 and ≫ T 1. Before introducing an explicit model, we need to make some important considerations.
We had already anticipated that the purpose of the models introduced in the previous sections was not that of introducing realistic models of financial time series. For instance, it is well known that the simple stochastic processes considered in section 4 are far too simple to reproduce some key stylized facts observed in real financial time series, such as volatility clustering [47,48] or bursty behavior [49]. Moreover, being entirely binary, the above examples cannot address other well established properties characterizing the amplitude of fluctuations, e.g. the 'fat' (power-law) tails of the empirical distributions of price returns.
Nonetheless, there is a simple argument that legitimates us to use a proper extension of the above modelling approach, especially that introduced in section 5, provided that we adequately calibrate such extension on the observed set of multiple time series. The argument is basically the realization that we can properly model the binary signature of a time series, using temporal iterations of even the simplistic models we have introduced in section 5, if we assume that some aggregated information measured on the original 'weighted' time series r i (t) ( ⩽ ⩽ i N 1 ) can be used as a proxy of the driving factor defining the model itself. We will show that this simple assumption is actually verified in the data. In particular, we will show that a sequence of temporal iterations of the biased random walk model, which assumes that the binary time series is driven by an 'external' field, can be 'bootstrapped' on the real data by assuming that the . This model will reproduce with great accuracy, and mathematically characterize, the empirical nonlinear relation between these two quantities that we have illustrated in section 2.2. We will finally test the temporal robustness and predictive power of the model, and conclude with a discussion of the relatedness of our approach and more traditional 'factor models' in finance.

Temporal dependencies among cross-sections
In order to execute the above plan, we first analyze the correlations between single crosssections of the market. We need this preliminary analysis in order to determine whether the temporal extension of the models defined in section 5 should incorporate dependencies among different snapshots.
Based on extensive financial literature, we expect no correlation (on a daily frequency) among the returns of different cross-sections. However, most analyses focus on the autocorrelation of individual stocks, based on their weighted returns. So, to check our hypothesis we perform an explicit analysis of the temporal auto-correlation of the observed time series of the aggregate, binary return x t . This analysis is shown in figure 10 for the three indices, using daily data for the year 2006. We confirm that the observed autocorrelation is not statistically significant, since (apart for a few points) it lies within the range of random noise (calculated by imposing a threshold of two standard deviations on the Fisher-transformed autocorrelation). This type of uncorrelated dynamics is observed throughout our dataset. This means that, in line with other analyses of autocorrelation, the memory of the aggregate binary return of real markets, if any, is much shorter than a day.
Going back to the result illustrated in figure 9, we can then conclude that there is no significant correlation in the trajectories of the daily points populating the curves. In other words, given the knowledge of the position of the market in the AIC curves in a given day, we cannot predict where the market will move the next day, even if of course we know that it will move to another point in the curves themselves.

Reproducing the observed binary/non-binary relationships
The previous result sets the stage for our next step, where we consider an explicit extension of the models considered in section 5 to an ensemble of multiple time series, as introduced in section 3 in the general case. The absence of autocorrelation implies that we can define the Hamiltonian of the full N × T matrix X as a sum of T non-interacting Hamiltonians, each describing a single cross-section of N stocks.
Next, we need to choose the model to extend. We want the final model to establish (among other things) an expected relationship between the binary and the weighted aggregate returns, so that we can test this prediction against the empirical relationships illustrated in section 2.2. This implies that we need to input the measured weighted return r { } i * as a driving parameter of the binary model. Among the three models, only the biased random walk and the mean field model have parameters that can be related to r { } i * . In section 5 we treated those models as giving competing interpretations of the market model in terms of exogenous and endogenous effects, respectively. However, it should be noted that this is no longer possible as soon as the parameters of these models are made dependent on the observed return. For instance, if we assume that the parameter θ of the biased random walk depends on r { } i * (which is a property of the data), we can no longer interpret θ as an external field, since it has been somehow 'endogenized'. Determining whether θ can be interpreted as endogenous or exogeneous is now entirely dependent on whether r { } i * itself can be interpreted as endogenous or exogeneous. This tautology does not prevent us from determining a relationship between r { } i * and x { } i * in their full range of variation, because such a relationship is independent on the optimal (endogenous or exogenous) interpretation of both quantities. We also note that the choice of the model to calibrate on r { } i is now completely independent of the relative performance of the various models that we have determined in the case of free parameters, including their AIC weights shown in figure 9. Indeed, apart from an initial calibration, the parameters will no longer be fitted using the ML principle, making the AIC analysis no longer appropriate. In other words, ranking the 'free' models and endogenizing their parameters are two completely different problems. In particular, the low AIC weight of the biased random walk throughout most of figure 9 does not impede us from using this model in our next analysis. We will indeed 'bootstrap' the biased random walk on the real data, by looking for a relationship between r { } i and the parameter θ. We prefer this model over the mean field one because, while it is natural to think of (a function of) r { } i as a proxy of the 'field' θ affecting the market in the biased random walk model (notably, r { } i has a definition similar to that of a market index), it is less natural to think of the same quantity as a proxy of the interstock interaction J in the mean field model (although, as we said before, this would be technically possible).
Combining all the above considerations, we finally generalize the biased random walk model defined by equation (59) to the matrix case as follows: where θ ⃗ it a T-dimensional vector with entries θ t ( ). Note that, while the models we introduced in section 4 have time-independent parameters and therefore correspond to time series at statistical equilibrium (for example a model with constant volatility), we are now considering more general models with time-dependent parameters. Relating θ t ( ) to r t { ( )} i will allow us to incorporate any observed degree of non-stationarity of the data into the model itself.
As a preliminary calibration, we now look for an empirical relation between r t { ( )} i and θ t ( ). To this end, we first treat the latter as a free parameter and look for the optimal value θ t * ( ) maximizing the likelihood of the observed binary time series X * . Since the Hamiltonians for different timesteps are non-interacting, it is easy to show that θ t * ( ) is given again by equation (63) In figure 11 we compare the resulting value of θ t * ( ) with the corresponding observed weighted return r t { ( )} i * , for the three indices separately. Each point in the plot corresponds to a different day, and we considered 250 days (approximately one year) for each index. We find a strong linear relation between the two quantities. This relation can be fitted by the oneparameter curve is a property measured on the stock increments themselves, it reflects both external influences and internal dependencies. Therefore θ t * ( ) cannot be (entirely) interpreted as an external field. This confirms our interpretation of the biased random walk as a model agnostic to the (endogenous or exogenous) nature of the driving field in the present setting.
Combining equations (73)  , or vice versa, without fitting any parameter. In figure 12 we show the result of this operation. We confirm that the prediction of our model matches the empirical relationship very well.
We also consider a null model where we randomly shuffle the increments of each of the N time series independently. This results in a set of randomized time series, with elements ′ r t for each stock is preserved, but the returns of all stocks in a given day are uncorrelated. From ′ r t ( ) i , we obtain the binary signature ′ x t ( ) i as for the real data. As shown in figure 12, this randomized benchmark overlaps with the empirical trend only in a very narrow, linear regime. We will now try to understand this result.
The reason why the shuffled data result in a linear trend is the following. For each value of  The above expression suggests that the value of c strongly depends on the original log-return distribution. Therefore, we expect that the stability of c is determined by that of + r * . In section 6.3 we will study the stability of c in more detail.
The above simple argument shows that, for shuffled data, we indeed expect a linear relationship between ′ r t { ( )} , for finite realizations we observe a much narrower span of values (see figure 12). This is due to the absence of correlations among stocks, resulting Using equation (75), we get which theoretically justifies the fitting function we had used in figure 3. Again, rather than fitting that curve on the data, we can use the value of c determined from the (independent) fit shown in figure 11. This results in the non-parametric plot shown in figure 13. We confirm that, for each of the three indices, we can reproduce the observed relationship very well.
As before, we also show the relationship between ′ ′ r t r t for randomly shuffled data. The linearity of equation (78) now translates into an expected parabolic relationship: Figure 13. Nonlinear relationship between the average daily coupling (weighted coupling) and the average daily sign (binary return) over all stocks in the FTSE100 (left), S&P500 (center) and NIKKEI225 (right) in various years (2003, 2007, and 2004, respectively). Here each point corresponds to one day in the time interval of 250 trading days (approximately one year). The red curve is our non-parametric prediction based on the fit shown in figure 11, and the green points are the same properties measured on the shuffled data.
Again, real data strongly deviate from the above 'uncorrelated' parabolic expectation, because extreme events make the empirical coupling r r { } i j * * virtually 'diverge' when stocks are highly synchronized ( ≈ x |{ }| 1 i * ).

Stability of the parameter c
Once we have mathematically characterized the observed nonlinear relations, an unavoidable question arises: in a given market, how stable are those relations? Since c is the only parameter in the above analysis, the question simply translates into the stability of c. We have already noted that c is related to the average positive return + r * , which we expect to be relatively stable. In order to study the stability of c in more detail, we now consider several yearly and monthly time windows, and explore the time evolution of the fitted parameter for the three indices.
In figure 14 (upper panels) we plot the values of the parameter c (with error bars) for 11 yearly snapshots (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010). It is clear that there are periods during which the yearly values are relatively stable, and periods when they fluctuate wildly. Thus, in most cases the fitted value of c in a given year does not allow one to make predictions about the value of c in the next year.
However, we can also consider a monthly frequency. In the bottom panels of figure 14 we show the result of our analysis, when carried out on the 12 monthly snapshots of year 2006. We choose this particular year because, in the yearly trends shown above, it represents very different points for different markets: the end of a stable period for the FTSE100, an exceptional jump for the S&P500, and the middle of an increasing trend for the NIKKEI225. Despite these differences, we find that in all three markets the monthly dynamics is much more stable than the yearly one. In particular, the trends for FTSE100 and NIKKEI225 are almost constant, and for the S&P500 there are only two deviating points from an otherwise stable trend (despite the large fluctuation that 2006 represents in the yearly trend for this index). This implies that, in most cases, one might even use the monthly value of c out of sample, in order to predict the future relationship between x { } i and r { } i based on a past observation. We should however stress that the aim of our method is to characterize such relationship, and not to predict it. Indeed, we cannot imagine any situation in which only the binary (or only the non-binary) information is available.
The above results show that there is a trade-off between short and long periods of time. For short (e.g. monthly) periods there are fewer points to calculate c through a fit of the type shown in figure 11. This explains why the monthly trends in figure 14 have larger error bars than the yearly trends in the same figure. By contrast, for longer (e.g. yearly) periods each individual fit is better, but there are more fluctuations in the temporal evolution of the parameter c, because the data are less stationary. In general, we expect that in each market, and for a specific period of time, there is a different 'optimal' frequency to consider.

Relation to factor models
We would like to conclude this paper with a discussion of the relationship between some of our findings and the popular factor models in the financial literature [3]. As a basic consideration, we stress that factor models can only be applied to the original (non-binary) increments (it is impossible to decompose a binary signal into a non-trivial combination of binary signals), while our models only apply to the binary projections. We should bear this irreducible difference in mind in what follows. However, due to the mapping between binary and non-binary increments that we have documented, we can indeed try to relate the two approaches.
First, let us consider the shuffled (uncorrelated) data, where the original log-returns are randomly permuted within each of the N time series. It is well known that the total temporal increment (over T time steps) of any empirical time series of price increments is generally close to zero (due to market efficiency), and that the distribution of log-returns is mostly symmetric around this value. This is especially true if each of the N original time series has been separately standardized, i.e. the ith temporal average has been subtracted from each increment of the ith time series, and the result has been divided by the ith standard deviation. In such a case, the N log-return distributions become also very similar to each other, because their support is the same and their values are comparable. This means that, after the shuffling, the time series are sequences of independent and almost identically distributed variables with zero mean. We denote the corresponding increments as where the ϵ i are random variables. In a traditional factor analysis, the above scenario takes the form of a 'zero-factor' model. Under this model, the aggregate increment over N stocks is expected to be narrowly distributed around is consistent with a uniform random walk (see figure 9). Therefore we find that the zero-factor model (for the non-binary returns) and the uniform random walk (for the binary returns) are consistent with each other in the linear regime. In other words, when in our analysis we measure a value of x t { ( )} i that is consistent with a uniform random walk, we know that the original log-returns are consistent with a zero-factor model.
Next, we consider a one-factor model, where there is one dominant underlying factor assumed to control the dynamics of all the time series. In such a case, each return can be decomposed as where α i is the 'factor loading' of the ith time series with the dominant factor Φ t ( ) 0 . When referring to stocks, the factor Φ t ( ) 0 is attributed to the market mode. It is known that, during crisis times when the markets are highly correlated, a one-factor model can describe the dynamics quite well. Under this model, the aggregate increment is is the average loading, which is independent of both i and t. This result implies that, when the market is well described by a one-factor model, the average increment r t { ( )} i that we measure in our analysis is proportional to the factor Φ t ( ) 0 itself. We note that the one-factor model is somehow similar to our biased random walk model, as it assumes a common drive for all the stocks. However, since Φ t ( ) 0 is fitted on the data, the one-factor model cannot distinguish between an endogenous or exogenous nature of the common drive. This situation is similar to when we use the observed value of r t { ( )} i as the driving field of the biased random walk (see section 6.2).
In financial analysis, the factor model can be used to filter the original time series and remove the one-factor component from them. When the model is a good approximation to the real market, the filtered returns are ϵ ≈ r t t ( ) ( ) i i , leading us back to equation (84) and the related considerations. In such a scenario, there is no correlation among the stocks, and each stock is acting as an i.i.d. variable. We therefore expect that, if we remove the market mode from the original time series, then (in periods where the market is indeed dominated by a single factor) we would obtain results similar to the shuffled case, and we would find the system in the uncoordinated phase of figure 9.
However, despite the fact that in certain conditions the one-factor model can generate the market behaviour, the model is too simplistic [3]. In reality the dynamics is more complex and can be attributed to many factors, that sometimes overlap with industrial (sub)sectors. Generally the different factors are identified by the largest, non-random eigenvalues of the empirical crosscorrelation matrix, where the market mode relates to the highest eigenvalue [3]. The presence of many deviating eigenvalues is an indication of the fact that the one-factor model should be rejected. A more realistic, M-factor model is where j = 0 denotes a common market-wide factor as above, while > j 0 denotes sector-specific factors. In such a case, our measured value of r t which is a linear combination of the multiple factors controlling the market dynamics.
It should be noted that factor models cannot distinguish between an endogenous and exogenous origin for the factors Φ t ( ) j themselves, even if we invoke some informationtheoretic criterion to rank different specifications of these models. By contrast, our binary models allow us to discriminate among these multiple scenarios, as we have shown in figure 9 and related discussion. Moreover, while our approach allows us to relate binary and non-binary increments of real time series and replicate the observed relationships among them (see figures 12 and 13), factor models cannot lead to a similar result, because they do not allow for a binary description.

Conclusions
We presented a novel method for the analysis of single and multiple binary time series. Our information-theoretic approach allowed us to extract and quantify the amount of information encoded in simple, empirically measured properties. This resulted in the possibility to associate an entropy value to a time series given its measured properties, and to compare the informativeness of different measured properties.
By employing our formalism, we have identified distinct regimes in the collective behavior of groups of stocks, corresponding to different levels of coordination that only depend on the average return of the binary time series. In each regime the market exhibits a dominant character: the market mode can be interpreted as an exogenous factor, as pure noise, or as a combination of endogenous and exogenous components. Moreover, each regime is characterized by the most informative property.
Finally and more importantly, we were able to replicate the observed nonlinear relations between binary and non-binary aggregate increments of real multiple time series. We have mathematically characterized these relations accurately, and interpreted them as the result of the fact that very large log-returns occur more often when most stocks are synchronized, i.e. when their increments have a common sign. Our findings suggest that the binary signatures carry significant information, and even allow one to measure the level of coordination in a way that is unaccessible to standard non-binary analyses.

A.1. Uniform random walk model
The trivial model is obtained when no constraints are enforced. In this case, there is no free parameter and the Hamiltonian has the form = H X ( ) 0. (A.1) As a result, the partition function is which is nothing but the number of possible binary time series of length T. The probability of occurrence of a time series X is then T and is completely uniform over the ensemble of all binary time series of length T. All the T elements of X are mutually independent and identically distributed with probability This results in a completely uniform random walk with zero expected value for each increment: While the (ensemble) variance of each increment equals

A.2. Biased random walk model
We now consider the total increment as the simplest non-trivial (one-dimensional) constraint: If we denote the corresponding (scalar) Lagrange multiplier by θ, the Hamiltonian has the form The partition function is The above expression shows that the stochastic process corresponding to this model is a biased random walk, as the two outcomes = ± x 1 have a different probability, unless θ = 0 (which leads us back to the uniform random walk model considered above).
The expected value of the tth increment x(t) (representing the bias of the random walk) is e tanh (A.12) x t 1 and the variance is The maximum likelihood condition (16), fixing the value θ * of the parameter θ given a real time series X * , reads where x t * ( ) is the measured average increment in the observed time series X * . This yields which gives a parameter value x t x t * artanh * ( ) We now consider a model where, besides the constraint on the total increment specified in equation (33), we enforce an additional constraint on the time-delayed (lagged) quantity T B X · ( ) 1 , where B X ( ) 1 is defined in equation (27) with τ = 1. This amounts to enforcing the average one-step temporal autocorrelation of the time series. The resulting two-dimensional constraint can be written as the column vector where we consider a periodicity condition as in equation (28) with τ = 1, i.e.
. Note that, when X is a real binary time series of length T, this condition can be always enforced by adding one last (fictious) timestep + T 1 and a corresponding increment + x T ( 1) chosen equal to x (1). For long time series, this has a negligible effect.
The above Hamiltonian coincides with that for the one-dimensional Ising model with periodic boundary conditions [46]. Each time step t is seen as a site in an ordered chain of length T, and each value = ± x t ( ) 1 is seen as the value of a spin sitting at that site. The model is analytically solvable, which allows us to apply it to real time series in our formalism. For the readers familar with time series analysis but not necessarily with the Ising model, we briefly recall the standard solution of the model, adapting it from [46].
Applying the periodicity condition of equation (28)  The expected value of the autocorrelation defined in equation (47) can be approximated as the ratio of two expected values as follows: When N is large, a traditional derivation [46] shows that the sum at the numerator of equation (B.24) is dominated by the single addendum corresponding to the maximum of C r . The same applies to the partition function at the denominator. If r 0 denotes the value of r such that C r is maximum, we then get From the above equation, one can infer the existence of a phase transition in the model, separating a regime where the expected 'magnetization' (here the average increment 〈 〉 M 1 ) is zero from one where it is non-zero [46]. This transition is discussed in section 5.3.
Before proceeding further, we note a peculiarity of the model, which has implications for the applicability of our maximum likelihood approach. An argument similar to that leading to equation (B.25) implies that the second moment of M X ( ) 1 can be expressed as Applying the maximum likelihood principle to equation (B.32) tells us to select J * as the solution of equation (B.30). However, we have seen that this condition leads to equation (B.31), which is actually equivalent to equation (B.29). Therefore, the value of J * can be found by replacing 〈 〉 (where we have set h = 0) to obtain the maximized likelihood of generating the observed crosssection X * under the mean-field model.