Predict or classify: The deceptive role of time-locking in brain signal classification

Several experimental studies claim to be able to predict the outcome of simple decisions from brain signals measured before subjects are aware of their decision. Often, these studies use multivariate pattern recognition methods with the underlying assumption that the ability to classify the brain signal is equivalent to predict the decision itself. Here we show instead that it is possible to correctly classify a signal even if it does not contain any predictive information about the decision. We first define a simple stochastic model that mimics the random decision process between two equivalent alternatives, and generate a large number of independent trials that contain no choice-predictive information. The trials are first time-locked to the time point of the final event and then classified using standard machine-learning techniques. The resulting classification accuracy is above chance level long before the time point of time-locking. We then analyze the same trials using information theory. We demonstrate that the high classification accuracy is a consequence of time-locking and that its time behavior is simply related to the large relaxation time of the process. We conclude that when time-locking is a crucial step in the analysis of neural activity patterns, both the emergence and the timing of the classification accuracy are affected by structural properties of the network that generates the signal.

sufficient condition for movement to take place. This is in line with those studies claiming that the RP may be a highly reproducible accident only fortuitously related to movement 18 .
In this study we introduce a model that contains the three WWW components, contains signals that are necessary but not sufficient for the final event "left" or "right" to take place, and is built in such a way that only predictions at chance level, i.e., 50%, are possible. As we shall see, after time-locking the trials to the time-point of the final event, the classification accuracy will raise well above chance level long before the end of the trials. We will show that this happens despite the fact that the analyzed signal does not contain any choice-predictive information.

Materials and Methods
Model setup. As a metaphor of the decision process, our model describes a random walker in a unidimensional room, i.e., on a line. Time and space are discrete and the walker jumps either to the right or to the left with equal probability at each time step (Fig. 1a). On both opposite walls, left and right, there is one light that the walker tries to switch on by pressing one button every time he happens to reach the wall (Fig. 1a). The two The rules of the model. A random walker moves by discrete, equally sized steps with equal probability to the left or to the right in a room (I). During the walk, the walker will eventually come close to the buttons placed on the right and left wall and presses the corresponding button without interrupting its walk. A switch turns on and off randomly, and independently of the position of the walker. Four types of events are possible: either the walker reaches one of the lights (left or right) when the random switch is off and the light stays off (cases II and III), or the switch is on and the light turns on (cases IV and V). (b) The black line shows a short piece of a trajectory in which the "right" (blue) light on the right wall was successfully turned on. The N positions, colored in light blue, preceding the successful event constitute a "right" trial. (c) Averaging over many independent trials for both lights shows that the average position of the walker gets closer to the left and right walls as time grows towards t = 0, when the trajectories are time-locked to a successful light bulb on.
Scientific RepoRts | 6:28236 | DOI: 10.1038/srep28236 buttons, however, are connected to power only at random times and are not synchronized either with each other or with anything else. The random process that turns the power on and off accounts for an independent veto process that aborts a decision not allowing it to become action.
The power circuit is on at each discrete time step with a fixed probability and stays on just for the duration of one time unit. If by chance the walker presses the button when the power is on, then the light will shine for just one time step. Otherwise, the light will stay off. The epoch of length N of the walker's trajectory prior to a light flash corresponds to one trial (Fig. 1b). Here, we considered the limit case when the time between two consecutive "power on" events is much longer than the time needed by the walker to visit the whole room uniformly. This last requirement is equivalent to ask subjects in an experiment to avoid correlations between consecutive button press events.
To understand the effect of topology and of the intrinsic time scales, we have both considered rooms of different sizes (i.e., different distances between the walls) as well as a version of the model where we give the walker the possibility to jump from any position to any other position at any step and with equal probability. As we shall see, both variants are crucial to interpret the result of the classification. An intermediate variant of the model is considered in the Supplementary materials. Data simulations. In oder to quantitatively implement the metaphor discussed above and to perform the simulations of the walker in the room (Fig. 1a), we have considered a stationary random walk process on a line with n + 2 positions {0, 1, … , n, n + 1}. One "left" trial is then generated as follows. (i) A very long time series of the stationary random walk is first generated in such a way that every state is visited a large number of times. (ii) Then, one out of the many occurrences of the position (or state) 0 is chosen at random with equal probability and the N preceding steps of the walk are stored.
The next "left" trial is generated by starting again from a new and independent time series at stationarity. The "right" trials are generated in a similar way by randomly choosing one of the many occurrences of the state n + 1 instead. Each trial is extracted from an independent stationary time series that does not contain any information about which trial will be eventually extracted. In this way, the states 0 and n + 1 are necessary but not sufficient conditions to generate a "left" and "right" trial, respectively. Once this correspondence is determined, one can easily generalize this process to any kind of network. The easiest generalization is for the complete graph, as described below. Any other network with non-homogeneous degree distribution but symmetric with respect to 0 and n + 1 delivers the same qualitative results (Supplementary materials).
With this setup, we have generated statistically independent trials simulating a random walk in discrete time on two types of networks, a linear chain and a complete graph (Fig. 2). The linear chain simulates the room contained within two walls. The complete graph, instead, is a topology that guarantees the possibility for the walker to jump with equal probability to any position in just one step. At steady state, the random walk visits all states {0, 1, … , n, n + 1}, with uniform probability. Thereby, the states 0 and n + 1 are two boundary states and the lights can flash only when the walker is in one of these states. As already mentioned, visiting the boundary is a necessary condition for the light to go on, but it is not sufficient. We interpret this both as an effect of veto that can independently inhibit the light to shine and as a model for brain signals that are necessary but not sufficient for an event to happen.
To mimic experimental conditions we generated k independent sets (participants) of 2M independent trials, M called "left" and M called "right". Both for the linear chain and for the complete graph, we have generated the time-locked to t = 0 based on the average across k = 500 realizations (i.e., participants) for a random walk on a line with n = 5 and n = 10. For both n, the accuracy decreases from 100% moving backward from t = 0 and the smaller the system size is, the faster is the accuracy decrease. The inset is a scheme of the linear network. To ensure a uniform distribution, the walker can jump with equal probability either left or right from each position but the boundaries. (b) Time course of classification accuracy for a random walk on a complete graph. The average accuracy remains 50% and is independent of n. The inset shows the scheme of a complete graph, here the walker can jump from any position (or state) to any other position in just one step. trials exploiting the time reversal property of the random walk process. The same holds also for a graph with non-homogeneous degree distribution (Supplementary materials). Support-vector classification. We used a time-resolved cross-validated linear support-vector machines (SVMs) for classification (Supplementary Figure 1). The time point of the final event is set at t = 0. All other previous positions of the walker are at negative times. After time-locking at t = 0, the 2M trials for each of the k participants, the cross-validation approach initially consists in subdividing the trials into independent training and test groups. To avoid classification biases, training and test sets contained an equal number of "left" and "right" trials. At each time point the training set was used to train a support vector machine to distinguish between "left" and "right" decisions 2-4 . The obtained model was then used to classify the test set. We have used a leave-one-pair-out cross-validation: each pair of the 2M "left" plus "right" trials was successively used for testing the model learned on the remaining 2M − 2 trials. The classification performance was quantified in terms of accuracy: each cross-validation iteration produced 0%, 50% or 100% depending on whether the classifier attributed 0, 1 or 2 correct labels to each pair of test trials. At each time point t, the goodness of classification was given as average percentage a t across all iterations (M per participant). The same analysis was repeated at each time point, thus generating a time course of accuracy a t for each of the k participants. The result is presented in terms of the average across all simulations of the time course of accuracy ( Fig. 2a,b). The actual classification was performed using the standard Matlab (The MathWorks, Inc., Natick, Massachusetts, United States) library for support vector machines 21 . The choice of this classifier is dictated purely because we wanted to use the same analysis techniques that are commonly employed for similar experimental data, and especially those analysis techniques used in the experiment with the "left" and "right" button presses 2 .
The random walks. In this work we have considered two versions of the random walk. A third version is discussed in the Supplementary materials. In all cases, the state space of the walk is given by the set of n + 2 states {0, 1, … , n + 1}. Technically the random walk considered here is a Markov chain in discrete time on this state space. Let X t be the random variable that gives the state visited by the process after step t = 1, 2, … . The transition probabilities governing the behavior of the process are formally defined from the conditional probabilities as the elements of the (n + 2) × (n + 2) dimensional transition probability matrix P. We have chosen this model only because it is the simplest conceivable model conveying our main conclusions.
Random walk on a line. The random walk on a line is the one-dimensional random walk with two boundaries.
To ensure a uniform stationary distribution on this state space, the transition probability matrix P ij defined in (1) takes the following explicit form ij def which can be more explicitly written as For later use, we define also the t-step transition probability matrix P (t) defined as the t-th power of P and we indicate it with ij t t t ij ( ) def 0 whose limit, as t → ∞ gives the stationary probability distribution π . Given the choice of the transition probabilities (2), the stationary probability is uniform with π i = 1/(n + 2) for all i = 0, 1, … , n + 1.
Random walk on a complete graph. The second random walk model considered in the manuscript is the random walk on a complete graph. On this graph, the transition probabilities are given by ij for any choice of i and j in the state space {0, 1, … , n + 1}. The generation of the trials proceeds exactly as described previously. The random walk on a complete graph has the same transition probabilities as the stationary probability distribution. Technically, the transition matrix (5) can be seen as the transition matrix (2) taken to an infinitely large power. In fact, Eq. (5) is identical to the stationary distribution π of the process described by Eq. (2). This correspondence implies that if the time resolution of a measurement is long compared to the internal timescale, a process whose connectivity is for instance the linear chain may seem more connected than it is in reality. This correspondence does not hold for more complex networks of states with non-homogeneous degree distribution.
Relaxation timescales. Consider the orthogonal set of eigenvectors  v i , for i = 0, 1, … , n + 1 of the transition matrix P and their associated eigenvalues λ i , whose property is that λ 0 = 1 and all other λ i have real parts strictly smaller than one in absolute value. The eigenvalue λ 1 is a real number and it is the closest to λ 0 . Let us define the vector of initial conditions as the vector giving the probability mass function for the stochastic variable X 0 , j 0 def (0) and the vector as the vector giving the probability mass function for the variable X t under the (implicit) condition that X 0 is distributed according to P (0) . Then, these two vectors are related through t t ( ) (0) ( ) whose long time behavior gives the unique stationary probability π as the normalized eigenvector for the unitary eigenvalue of P π π = ⋅ P, ( 10) where π ≡ v 0 . The vector P can be decomposed on the orthogonal space of the eigenvectors, as where we have defined the time scales τ i = − 1/log(λ i ), whose real part is positive. Eq. (12) means that the vector P t ( ) approaches the stationary state π as t → ∞ because all λ i t tend to zero in this limit. The largest of the λ i , namely λ 1 is the one that governs the long time behavior of this limit. It is therefore customary to associate a relaxation time scale τ 1 to each Markov chain with a unique stationary state. The relaxation time scale is related to the largest eigenvalue λ 1 smaller than unity of the transition matrix P, as 1 1 whose meaning is that τ 1 gives a lower estimate of the time scale needed to cover the state space according to the stationary probability distribution. For the random walk on the line, λ 1 grows with the number of states n and becomes closer and closer to the value 1, the time scale τ 1 becomes also larger and larger (Fig. 3c). The short time behavior of the system is however dominated by all involved time scales τ i . A consequence of this discussion is that whenever a functional depends on the elements of the t-step matrix P (t) it depends on the λ i t and therefore on the relaxation time scales τ i . The relaxation time scale for the complete graph is again given by the inverse of the logarithm of the largest non-trivial eigenvalue. The relaxation time for the random walk on a complete graph is not dependent on the size of the system and is virtually zero (Fig. 3c). This means that the process has reached the steady state just after one step, due to the fact that the one-step transition probability matrix in Eq. (5) is already the matrix for the stationary state. For more complex networks that interpolate between the line and the complete graph, λ 1 would depend non-trivially on both the topology of the network and on its size. The analysis of time scales presented here is very similar to the one applied in the context of neural networks 22 .
Time reversibility. For a process at stationarity, the time reversed transition matrix is defined as where π X 0 is a shorthand for X 0 being chosen according to the stationary probability mass function π . The inverse matrix can be rearranged using the definition of conditional probabilities as Scientific RepoRts | 6:28236 | DOI: 10.1038/srep28236 where P ij is defined in (1). For both random walk models considered here, the matrix P (−) coincides with the forward matrix P. In general, however, when the network of states has loops and cycles, the time reversibility property does not hold and the simulation of the process backward in time requires the use of Eq. (15).
Information theory. For a stochastic variable Z, which we assume here to take values in a countable set σ, we denote with H(Z) its Shannon entropy, defined as with Pr(⋅ ) being the probability mass function associated to the stochastic variable Z, i.e. Pr(z) = Pr{Z = z} for any z ∈ σ. In the following we will simplify the notation and write Z 2 instead of Eq. (16). For the problem discussed here the mutual information def asks how much information about a stimulus S can be decoded from the response R.
Time-locked mutual information. Let us consider time-locked trials at t = 0. Time-locking the trials in our random walk means that the walker's position X 0 at time zero is either 0 or n + 1 given that the light flashes. We describe this condition by saying that the response R occurs at t = 0. The time-locked mutual information I 0 (S; R) implements this condition through: where the response R takes values in "left" or "right" whereas S is the position X t of the walker after step t. The calculation of I 0 can be reported into the framework of the standard definition given in (18) by introducing a new random variable Y t carrying the information about X 0 being restricted to the two boundary states 0 and n + 1 with equal probability (Fig. 3a). The variable Y t is precisely defined as the variable X t when the initial condition X 0 is either in state 0 or in state n + 1. This is more precisely expressed by the set of variables t t t def 0 0 which says that the event Y t = k is given by the two independent events X t = k when X 0 is either 0 or n + 1, for k ∈ {0, 1, … , n + 1}, where X 0 is the position of the walker when the events "left" or "right" occur. Note that by construction X 0 must be either 0 or n + 1 with equal probability and that the following condition holds  and similar for X 0 = n + 1. Thanks to the variable Y t , we have the identity t t 0 which allows using Eq. (18) for the explicit calculation.
When the time t is measured relative to the time point in which the "left" and "right" trials end (both forward and backward in time), the distribution of X t is not the stationary distribution π , derived in Eq. (10). The choice of the time-locked trials forces a new distribution of the process (Fig. 3a), which leads to use an appropriate random variable defined in (20). We analyze now the mutual information where the subscript 0 just reminds us that the use of the variable Y t is limited to time-locked trials, i.e., on the condition R occurs at time t = 0. Here t can take any integer value, with the meaning that negative t means that Y t is the process before the "left" or "right" event, whereas positive t means that it is afterwards. When t > 0, the time-locked mutual information is used to make predictions 23 , when t < 0 the time-locked mutual information can be used to perform a classification or interpolation 24 . Due to the time inversion symmetry of the random walks on the line and on the complete graph there is no difference in the results for positive and for negative t. To increase clarity, however, we will henceforth explicit the negative sign of t. To proceed, we need to derive two properties of the random variable Y. We start with t k n t t 0 1 2 which is computed with elementary matrix algebra. Two limits can be easily computed by hand. At t = 0, the variable Y 0 can be either 0 or n + 1 with equal probability. Therefore, it results H(Y 0 ) = 1. In the limit t → ± ∞ , instead, Y −t becomes stationary and takes the same stationary distribution π as X −t (Fig. 3a). In this case we A second useful property of Y −t is the following where the last probability can be computed explicitly using the transition matrix P. A similar expression holds obviously also when R = "right". Therefore, we can now compute the conditional Shannon entropy using matrix algebra by means of Eq. (4). In the first sum, "left" and "right" are denoted with l and r, respectively. Plugging Eqs. (25) and (27) together in the definition of I 0 given in Eq. (23) finally leads to a time dependent mutual information depending solely on the t-step transition probabilities. Since the time behavior of these probabilities depends only in the intrinsic timescales, the mutual information (23) decays, going backward in time, according to the time scales of the process (Fig. 3b).

Unconstrained mutual information.
Also for the unconstrained mutual information I(S; R), the response R is one of the events "left", "right" and the stimulus S is the position X t of the walker after step t. In the calculation of I(S; R) there is no time-locking and the pattern X t is sampled without any knowledge about the future. The mutual information (18) can be rewritten in the more useful form t t For the random walk models considered here, the single terms of this formula can be computed as follows. The Shannon entropy of the variable R alone is given by since "left" and "right" (here denoted with l and r, respectively) occur with equal probability. Furthermore, since X t is not time-locked with the event R, in our random walk the event R is independent of X t by construction and thus it results also identically. This is in agreement with our expectation since the trials have been built to lead to "left" or "right" with equal probability independently of the values taken by the variable X t . This calculation demonstrates why the unconstrained mutual information captures the true nature of the underlying process. Deriving the unconstrained mutual information for our model is simple and can be done analytically. However, an application to real data may be challenging due to the limitation of imaging techniques. Such an application would indeed require the analysis of relatively large sets of data. This is necessary in order to determine the structure of the network that reproduces the dynamics connecting the various recorded patterns. For instance, fMRI measurements deliver time series of spatial brain activity patterns. After associating each spatial pattern to a state, the time series can be seen as a walk on this network of states. Once this network is known and the associated transition probabilities and the order of the Markov process describing the dynamics 25 are determined, the unconstrained mutual information can be computed. When X t visits those states that are sufficient to generate/predict the response R, then the mutual information will be larger than zero. The main challenge of this method relies on the limitations of brain imaging techniques. Probably, fMRI is not suitable for this analysis since long time series must be collected, both in the presence and in the absence of the event one wants to study. However, EEG, intracranial EEG, and single cell recordings are well established methods and could allow this approach.

Results
To mimic the experimental procedure 2 , we have generated a large number of independent trials ending randomly with the event "left" or "right" with 50% probability (Fig. 1a,b). The trials were built in such a way that no prediction better than 50% is possible. Once all the trials have been time-locked at the time point of "left" or "right" event, they were classified using the same tools and approaches as in the experimental works 2-4 (Supplementary Figure 1). For trials generated using a random walk on the linear chain (Fig. 1a) we obtained a classification accuracy above 50% several time steps before the end of the trials, which climbed to 100% at t = 0 (Fig. 2a). When the trials were generated using a random walk on a complete graph, instead, the mean accuracy remained at 50% level at all times (Fig. 2b). If we did not know the properties of the model that generates the trials, we would have interpreted the result for the random walk on the line (Fig. 2a) as evidence of an activity predicting the upcoming decision while approaching the "left" or "right" choice. However, for both networks only predictions at chance level are possible by construction. Therefore, the interpretation of accuracy above chance as reflecting "choice-predictive signals" must be wrong and the accuracy time course calls for a different explanation.
Time-locking the trials such that the events "left" or "right" occur at time point t = 0 is equivalent to knowing that at time point t = 0 the position X 0 of the random walk is either equal 0 or n + 1 (Fig. 3a). To understand the role of time-locking, we exploit decoding methods from information theory 26,27 , which are intrinsically related to classification 28 but allow analytical treatment (Materials and Methods). Methods based on information theory are often exploited to extract predictive informations from neural signals especially when past stimuli are used to predict future events 23 . The mutual information def tells how much information about a stimulus S can be decoded from the response R, when H(X) is the Shannon entropy associated to the random variable X (Materials and Methods). In our model, the stimulus S is the position X t of the walker at time t prior to the left/right event. The response R is either "left" or "right". For times t < 0, the time-locked mutual information t t 0 def contains the information that a response R has occurred at time t = 0. There is a profound difference between the time-locked mutual information and the unconstrained mutual information I = I(X t ; R), where no information beyond time point t is known. Using methods for future conditioned stochastic processes [29][30][31] , both functions I and I 0 can be computed analytically for our random walk (Materials and Methods). The time course of the time-locked mutual information I 0 (Fig. 3b) is qualitatively similar to the SVM classification accuracy (Fig. 2a): it is maximal at the time point of time-locking, i.e., t = 0, and decreases at times t < 0 at a rate that depends on the relaxation timescales of the process (Materials and Methods). In contrast to this, the unconstrained mutual information I(X t ; R) is zero at all times, consistent with the fact that the random walk trajectory does not contain information about whether R will be "left" or "right". Therefore, only the unconstrained mutual information I(X t ; R) gives a faithful representation of the procedure employed to generate the trials. This result shows that time-locking combined with the slow relaxation time of the walk (Fig. 3c) produces classification accuracies significantly larger than 50% before t = 0.

Discussion
Our modeling approach allowed us to understand the effect that time-locking has on the analysis of the neural signal preceding the outcome of a decision. We have generated data with a simple strategic model and analyzed them using the standard analysis techniques, based on the SVM classifier, typically exploited in the experimental works. We have complemented the analysis with an original approach based on information theory, which allows a transparent mathematical treatment. While the accuracy of the SVM alone can be confusing, the treatment with mutual information offers more clarity and allows to highlight the conditioning introduced by time-locking. In this way, no confusion can arise. However, when one believes to compute unconstrained quantities and has overseen the conditioning introduced by time-locking, a confusion in the interpretation of the result necessarily arises. Indeed, one would erroneously come to the conclusion that the time course of the accuracy is evidence of predictive signals where instead it is just time-locking and relaxation time. We have seen, indeed, that the classification accuracy is well above the chance level of 50% long time before the end of the trials when the trials are generated with the linear network model. We have demonstrated that this time behavior can be explained through the combined effect of network topology and relaxation timescale of the modeled process. By construction, our model does not contain any predictive information. From this we have to conclude that the raise of the classification accuracy long time before the time-locking event is not necessarily a signature of the emergence of predictive signals.
To fully capture the deceptive role of time-looking just consider the following instructive argument. Given a linear network with the buttons always connected to power, the light goes on each time one presses the button (Fig. 1a). If the walker is just one step before, say, the left wall, the probability that the left light will shine at the next step is 0.5. However, if we know that the next time step will be a decision time, the same probability is 1. This effect is reflected on the analysis and it is quantitatively evident when looking at the difference between the results of the conditioned, time-locked mutual information and the unconstrained mutual information. Only this last approach is able to show that there is no predictive signal. Our model generates a signal that is necessary but not sufficient to the generation of the final event. It may be argued that brain activity does not have such kind of signals. However, a recent experimental study on vetoing 20 has shown that there are necessary but not sufficient brain activity patterns related to the decision and execution of simple tasks. These signals, indeed, can deceive a classifier trained to recognize brain activities related to movement.
Beyond the technical aspects, our model belongs to a broad class of models often used to study neural activity related to decision processes 17 . In line with these models, we believe that our result has a relevance in relation to the common paradigms in the field, as we will explain here.
When, What, Whether. The neural decision of "when" to move was recently investigated by modeling electrophysiological signals with a leaky stochastic accumulator model 17 , which may look somewhat similar to our model. However, our conceptual model is different. We aimed at introducing a conceptual model that captures all the fundamental ingredients of volition 15 . We were therefore interested in accounting not only for the "when", but also for the "what" and, most importantly, for the "whether" decisions. For this, our model includes a stochastic process implementing the decision between "left" and "right". This process has an intrinsic dynamics and a corresponding time-scale. Moreover, the model describes the veto process represented by the stochastic switch, which does not allow to systematically translate intention into action. This approach allowed us to show the theoretical pitfalls in the debate about free-decisions. In the present work, we describe the decision process with a simple diffusion without a drift term. This point is crucial to show that time-locking introduces a bias that generates apparent predictive signals. While our model does not allow predictions better than chance, in the stochastic accumulator model 17 the presence of a drift term ensures by construction that eventually a decision will be taken. This is equivalent to say that the information about the decision accumulates in time and therefore predictions are intrinsically possible in the drift-diffusion model.
Veto process. Our model includes a veto process represented by the stochastic switch, which does not allow to systematically translate intention into action. Similar to other studies concerned with volition 20 , here we use the term veto because it was traditionally introduced by Libet. However, we do not share the dualistic flavor of Libet's interpretation of this process. In contrast to Libet, who considered veto as the control of the conscious mind uncorrelated with brain activity, we believe that the veto is implemented in specific brain networks 32,33 . As for the stochastic accumulator model 17 , also our conceptual model aims at describing the decision process in its pre-motor phase while veto comes at a later stage and can inhibit the motor output of decision. We considered the decision process and veto as being statistically independent. From the experiments we know that proactive inhibition can slow down motor execution 34 . Because our model does not account for a motor-phase extended in time, we introduced veto as a binary process that can only allow or stop the execution of the intended action instead of acting as a slowing-down mechanism.
Relationship to Libet-like experiments. Our result supports an alternative approach to investigate the neural determinants of free-decisions. Besides confirming the bias of time-locking and suggesting a more appropriate analysis, our approach evidences the limitation of Libet-like experimental paradigms [1][2][3][4][5][6][7][8] . Already in his original work 8 , Libet reported that sometimes participants consciously felt the urge to move but they inhibited their action before a movement occurred. Moreover, it was recently shown that even when their decisions are predicted in real-time using brain signals preceding their actions, participants can veto their action before movement onset 20 . These experimental evidences confirm that veto implicitly plays a crucial role in Libet-like experiments. From the analysis point of view, time-locking to the button press is equivalent to ignore the veto because only trials corresponding to not-vetoed actions are considered. We have shown that this approach leads to misleading results. From the experimental point of view, paradigms that simultaneously include all decisions (when, what, and whether) can make explicit the effect of veto and are therefore more ecologically valid. When analyzing the data from such paradigms, or in order to interpret previous results 1-4,6-8 , it is therefore fundamental, on one hand, to quantify how the different WWW components modulates each other and, on the other hand, to quantify how this modulation changes in time.
Structural and topological effects. Finally, different brain regions are characterized by different intrinsic time scales probably related to the structure of the underlying neural circuit 35 . Furthermore, several experiments [2][3][4] show that there are brain areas in which the accuracy of classification increases very late or it does not increase at all and that some of the brain regions showing significantly large classification accuracy are also particularly large in size 11 . Our approach allows an interpretation of these findings. The random walk teaches us that the classification accuracy can be enhanced by increasing the relaxation time of the process. The random walk on a line, e.g., has a long relaxation time that grows with the number of states (Fig. 3c). This type of walk generates trials that are easy to classify (Fig. 2a). In contract to this, trials generated from a random walk on a complete graph cannot be classified because the relaxation time is very short (Fig. 2b). In this latter case the accuracy remains always around chance. Thus, small, fast, and highly connected networks will lead to little increases in accuracy; large, slow, and sparsely connected networks will produce a stronger increase in accuracy. Therefore, the classification accuracy is a useful quantity to study structural properties of the neural circuit involved in the generation of task-related brain signals.

Conclusions
Taken together, our analyses show that classifying trials ending with a decision does not imply extracting predictive information about the decision itself. We have shown this by using the logic of a reductio ad absurdum proof. We have generated data that do not contain predictive information about the final event, i.e., "left" or "right" button press, and analyzed them with a standard classifier after time-locking the trials to the time point of the final event. We have shown that the time-course of the classification accuracy prior to the final event is well above the chance level depending on the topology of the underlying network of states. Since by construction the data do not contain any information about the future outcome, the high level of the classification accuracy cannot be interpreted as prediction. We have then exploited a more transparent approach based on the mutual information and demonstrated that time-locking introduces a bias analogous to future-conditioning. This allowed us to claim that the time-course of the classification accuracy at t ≤ 0 is a consequence of the network's topology and of the time scales associated to the activity on the network. Our result adds to those critical positions questioning the existence of predictive signals of volition before awareness 17,36 and their interpretation in terms of free-will 14,37 . Instead of proving the existence of choice-predictive signals, the time course of the classification accuracy can be interpreted as the signature of task-specific structural properties of local neural circuits generating the recoded brain activity.
Our analysis shows a limitation of "reverse-time" event-related studies, in which a signal S preceding a known event R is analyzed retrospectively to the occurrence of R. In these cases, signals S that are necessary but not sufficient to R will seem to be necessary and sufficient. Furthermore, the time scale of the decay, backward in time, of the classification accuracy is not necessarily related to the information about R contained in S but is due also to a structural component. We have shown how this structural component produces a large and long classification accuracy even in a model where the signal S has by construction no information about R. Time-locking to R generates therefore two important biases. On one hand the role of veto is bypassed; on the other hand, time-locking introduces a conditioning in the future that can create a long-time effect backward in time depending on the network topology. As we have shown here, this second bias produces the emergence of high classification accuracies also in the absence of predictive signals. Our result, however, does not apply to those studies 23 in which the effect of an event R on the upcoming signal S is studied.
We believe that a new analysis of the data, based on stochastic predictive models 38 , could help providing the time course of the unconstrained mutual information. We have discussed how our model is similar to previously studied model 17 but differs in several crucial aspects. Albeit simple, both these models capture the essential aspects of the decision process. More complex models of neural activity could and must be introduced in the future to better quantify the neural processes leading to decisions. However, also these models will have to cope with the result discussed here as long as time-locked trajectories are analyzed backward in time.