Let Your Eyes Predict

This study investigates the prediction accuracy of anticipatory pupil dilation responses in humans prior to the random presentation of alerting or neutral sounds. The aim of this study was to test the hypothesis that the autonomous nervous system may react prior to the presentation of random stimuli. A total of 80 participants, who were matched according to gender to take into account individual differences, were asked to listen to a random sequence of 10 neutral and 10 alerting sounds. Their pupil dilation was continuously recorded and the diameter of their pupils was used to predict the category of sound, alerting, or neutral. The pupil dilation of both males and females predicted alerting sounds approximately 10% more accurately than would be expected by chance, whereas neutral sounds were predicted at the chance level. This result was confirmed using a frequentist and a Bayesian statistical approach. Following the results of the study, practical and theoretical implications of these results are discussed.


Anticipatory Responses (ARs)
The study of anticipation is now a multidisciplinary theme, and there is a significant body of evidence in psychology and neurobiology indicating the presence of several anticipatory mechanisms in the brain. Soon, Brass, Heinze, and Haynes (2008) and a review by van Boxtel and Böcker (2004) of cortical measures of anticipation, highlight the crucial role of anticipation in a large array of cognitive functionalities such as vision, motor control, learning, and motivational and emotional dynamics (see also the Editorial of special issue of Cognitive Processes of Pezzulo, Hoffmann, & Falcone, 2007).
Since the pioneering study of Bechara, Damasio, Tanel, and Damasio (1997), there has been a growing interest in the study of the characteristics and the prediction accuracy of psychophysiological signals (e.g., skin conductance, heart rate [HR], etc.) measured before the participants are required to make advantageous or disadvantageous choices. For the sake of simplicity, we will define these signals as ARs. The most intriguing aspect of this phenomenon is that it is possible to observe differences in ARs before an event takes place. A second characteristic of this phenomenon is that the prediction of future events is completely unconscious because these ARs are too weak for participants to detect using introspective cognitive means.
If a sequence of events follows a rule, then the autonomic and neurophysiological systems can learn this rule before the person can discover it overtly. For example, Bierman, Destrebecqz, and Cleeremans (2005) asked participants to decide which "word" from a pair of "words" was the "correct" one. Unknown to the participants, the word from each pair was constructed using a different set of rules (Grammar A and Grammar B). A (monetary) reward was given if the participant chose the word from Grammar A. Choosing the word constructed using Grammar B resulted in (monetary) punishment. Skin conductance was measured during all 100 trials. After each set of 10 trials, the participants were asked how they had selected the "correct word." Task performance increased long before the participants could formulate a single relevant rule. In this preconceptual phase of the experiment, skin conductance showed a greater increase more prior to the participants making incorrect choices than before they made correct choices. Similar effects have been observed when measuring auditory mismatch negativity by Kimura, Schröger, Czigler, and Ohira (2010). This implicit learning capacity of the human autonomic and neural systems has a clear adaptive value, which allows us to predict whether future events may be dangerous or useful (Denburg, Recknor, Bechara, & Tranel, 2006). However, what happens if events do not follow a rule and instead happen at random? In this case, implicit learning is not possible and only more or less sophisticated guessing strategies can be employed, such as the "Gambler's Fallacy" strategy (Tversky & Kahneman, 1974).
However, since the late 1990s, some authors have attempted to discern whether ARs can be observed, even when implicit learning is not possible. If ARs can be observed even in this case, this would demonstrate that our autonomic and neurophysiological systems possess a more sophisticated capacity to predict future events than was previously thought and consequently, are set up to help us predict events that are generally thought to be essentially unpredictable.
The evidence for this perspective has been summarized by Mossbridge, Tressoldi, and Utts (in press). These authors carried out a meta-analysis of all studies that which have been conducted prior to and including 2010 aiming to find out whether there are differences between ARs relating to two categories of future events, for example, emotional versus nonemotional pictures. The results obtained from 37 studies reveal a significant effect with low-to-moderate effect sizes (ESs; random effects: ES = 0.28, overall z = 6.07, p < 1 × 10 -9 ; fixed effects: ES = 0.26, overall z = 8.7, p < 1 × 10 -17 ).
From these results, it seems that our autonomic and neurophysiological systems have the capacity to discriminate between the arrivals of two distinct categories of events, even if they are unpredictable.

Prediction Accuracy
The finding that our autonomic and neurophysiological systems can differentiate between two categories of events before their presentation has been observed through averaging the signals recorded from numerous trials to reduce the noise of intertrials differences and other sources of variance. However, to provide a real advantage, this anticipatory prediction function should predict all single future events. The study of the prediction accuracy of our autonomic and neurophysiological systems is still in its early stages, but there is already some preliminary evidence to indicate their predictive power. Tressoldi and colleagues (Tressoldi, Martinelli, Scartezzini, & Massaccesi, 2010;Tressoldi, Martinelli, Zaccaria, & Massaccesi, 2009) have used variations in HR to predict alerting versus neutral sounds. In a series of three experiments, they observed a mean prediction accuracy rate of 56% compared with a mean prediction accuracy by chance of 50% but only in females with a high level of Absorption, a particular personality trait that seems to enhance predictive ability. A 6% increase in prediction accuracy, although statistically significant, does not seem to be high enough to protect individuals from future negative events. However, this level of prediction accuracy may be a consequence of the procedure used to predict future events and not a limitation of our biological systems.

Individual Differences
Individual differences in autonomic and neurophysiological reactivity to identical stimuli have been well documented. For example, De Pascalis, Valerio, Santoro, and Cacace (2007) showed that the skin conductance response (SCR), (anticipatory) HR responses to tones (standards, deviants, and novels) and mild electric shocks differ between highand low-Impulsive Sensation Seeking participants. Greaves-Lord et al. (2010) observed that measures of autonomic flexibility, for example, HR and respiratory sinus arrhythmia (RSA), predict future anxiety levels in adolescent girls, but not in boys, in the general population. Given that not all individuals react in the same way, it is important to devise individual psychophysiological calibrations.
The role of gender in the prealerting of random events studies also seems to be supported by the results obtained by Radin and Lobach (2007), which used a random flash of light and a nonflash, by Radin and Borges (2009) using photographs with varying degrees of emotional affect and by the study of May, Paulinyi, and Vassy (2005) who used 97 db acoustic stimuli alternated with silent controls.

The Purpose of This Study
The main purpose of this study is to replicate previous studies related to the prediction accuracy of ARs using pupil dilation as the dependent measure. The relationship between pupil dilation and emotional arousal as well as with the anticipation of aversive events has been described and tested by Bradly, Miccoli, Escrig, and Lang (2008) and Bitsios, Szabadi, and Bradshaw (2004), respectively. Our stimuli (see Sounds characteristics) are of different levels of arousal and pleasantness and are consequently suitable for use in measuring changes in pupil dilation.
A second main interest is related to the prediction of different categories of events. For example, if prediction primarily concerns potentially dangerous versus neutral events, it is important to know the relative prediction accuracy of both categories of events bearing in mind that in this case, it is more advantageous to predict dangerous events than neutral ones.

Method Participants
It was decided that 80 participants would take part in the study, including 40 males and 40 females. The final sample comprised participants with a mean chronological age of 23 (SD = 3.5). Most were students who were contacted and tested by a research assistant.

Procedure and Materials
Before participating in the experimental session, participants were informed that the experiment consisted of two separated sessions, one to be completed immediately, and the second one, later that day or a couple of days afterwards. They were instructed as follows: This experiment is designed to test the efficiency of the intuition, that is, the capacity to acquire information that does not require conscious control and intentional mental activity of the person. In this experiment your implicit intuition will be observed by measuring your pupil dilation. We therefore ask that you keep your eyes inside a defined area of the computer screen. There is no need to keep your eyes still. During the various phases of the experiment you just have to keep your attention on the sounds with which you will be presented. Initially, you will be presented two series of 10 pleasant and 10 alerting sounds that will cause a slight alerting reaction. Before you begin, you'll hear a couple of these sounds to adapt the threshold volume to your preference. In the last phase of the experiment, you must predict if you will hear a neutral or an alerting sound. Remember that the sequence between the two sound categories is random and therefore you cannot use any strategy to enable prediction. Let your eyes predict the upcoming sound.
The light in the laboratory was constantly dim to avoid undesired or unrelated changes to the participants' pupils. The time necessary to complete the calibration was 2 min on average and long enough to accommodate to the ambient light.

Sounds Characteristics
The sounds as well as the arousal and pleasantness scores were obtained from the International Affective Digitised Sounds (IADS) collection (Bradley & Lang, 1999, 2000. Ten sounds were collected from those with higher scores and another 10 from those with lower scores with regard to pleasantness from the lists of males and females. The means and standard deviations for pleasantness and arousal of the two sound categories are presented in Table 1. Statistical comparisons of both Pleasantness and Arousal were similar (not statistically different) for males and females, whereas the differences between the categories were statistically significant with Cohen's ES (d) = 8.3 for Pleasantness and 2.1 for Arousal.

Sequence of Events for Each Session
In each session, the following sequence of phases was applied in the same order for each participant: eyemovements calibration, listening to the first series of 10 sounds, new eye-movements calibration, listening of the second series of sounds, new eye-movements calibration, and then prediction of sounds. As noted in the introduction, this procedure was adopted to take into account psychophysiological individual differences; in this case, differences in pupil dilation when hearing alerting and neutral sounds.
Separating the alerting and neutral sounds that participants would hear permitted us to obtain an average pupil measurement for each of the two sound categories for each participant.
Calibration. This procedure was always applied before the sounds were delivered to allow the eye-tracker to detect eye position. The participants were instructed to follow a dot moving smoothly across different regions of the personal computer (PC) monitor with their eyes. This eye-tracker model allows respondents to behave naturally as they would in front of any other computer screen without the necessity of fixing their head movements. If the calibration, which usually lasts for less than 1 min, was correct, then the participants were required to listen to a sequence of 10 alerting sounds or a sequence of 10 neutral sounds.
Measurement and storing of individual differences with regard to pupil dilation. During this phase, the participants were requested to listen passively to the sounds and to look inside the white circle presented in the middle of the monitor to allow the eye-tracker to record dilations of their pupils. Sounds were conveyed to participants by headphones (model Inno Hit SH-154), following a random sequence and interstimulus intervals ranging from 1 to 3 s. After this phase, which lasted for no more than 3 min, the same sequence of events (eye-movement calibration and listening to the second sequence of sounds) was repeated. The order of alerting and neutral sounds was balanced across the participants. The means of pupil dilation related to neutral and alerting sounds were automatically calculated and stored in the computer to be used in the following prediction phase. Prediction of all sounds. This was the critical session of the experiment. During this session, the participants were requested to listen passively to the sounds and "let their eyes" predict the category of an upcoming sound. The sequence of 10 alerting and 10 neutral sounds were presented randomly using a pseudorandom algorithm written in C++ by one of the authors (see the syntax in appendix). This algorithm returns a random number from 1 to 20 after initialization with a random value obtained from the system clock. The randomness algorithm was controlled offline using a simulation of 5,000 trials to check whether some number sequence could be repeated more than others. As expected, the frequency of number sequences was represented as a discrete uniform distribution. The random sequence was obtained before the first trial and maintained over the course of the whole experiment. In this sense, the order of sounds was predetermined. However, the choice to present the sequence of sounds without replacement introduces a bias because there is a small probability 1 that participants can predict the sound category using a strategy to count the number of sounds of each category above the level of chance. When all sounds of one category are presented, the remaining ones are clearly exemplars of the second category. This strategy, apart from the cognitive load it requires, can give prediction above the level of chance only when the sequence of sounds ends with at least 6 consecutive sounds of the same category, that is, 6 alerting or neutral sounds. We checked all 80 randomized sequences and none showed this characteristic.

Prediction Algorithm
The prediction algorithm is quite simple. In the Prediction phase (see sequence of events in Figure 1), just after the 2-s anticipation period but before the presentation of each sound, special software subtracted the mean dilation of each participant's pupils recorded in the anticipatory period from each of the two means of pupil dilations related to alerting and neutral sounds. The comparison with less difference was used to predict the category of the sound to be delivered. For example, if the average pupil dilation for alerting and neutral sounds for participant X measured in the individual difference phase was 3.5 mm and 4 mm, respectively, and the pupil dilation measured in the anticipatory period was 3.4 mm, the algorithm would predict an alerting sound.
This procedure was repeated for each of the 20 sounds to be predicted. Each trial lasted approximately 16 s.
The selection of the sound and its delivery from the computer to the headphones would occur a few milliseconds after the completion of the prediction phase. This procedure did not produce any artifact noises (e.g., hard drive noise) useful to identify the category of sounds to be presented.

Data Analysis
The participants' data were included in the study only if all 20 data were free from errors, missing data, or artifacts. Three participants, two males and one female, were discharged because of these problems and replaced with new participants.

Results
The descriptive statistics, sums of hits, means, and standard deviations of the accurate predictions (hits) obtained by males, females, and the whole sample are presented in Table 2.
Given that a comparison between the scores of males and females did not reveal statistical significant difference, all of the following statistics were calculated on the whole sample.

Inferential Statistics
To test the robustness and the results, we used both a frequentist and a Bayesian statistical approach.

Exact Binomial Test
The sum of hits for neutral and alerting sounds was tested against the expected mean probability, which was 50%. The null hypothesis was that the total hits would not exceed the level of chance and that, consequently, pupil dilation could not predict neutral or alerting sounds. Only the hits of alerting sounds met the criteria to refute the null hypothesis (z = 5.76, p = 4.2 × 10 -9 ).
It is evident that alerting sounds were predicted more accurately than could have been expected as a result of chance whereas neutral sounds were predicted at chance level. A graphical representation of the percentages of hits observed in males and females in the two task conditions is presented in Figure 2.
To check the reliability of these results, we analyzed the data with a one-sample t test.

t Test
The means of hits of neutral and alerting sounds were analyzed with a one-sample t test against the null hypothesis of a mean of five corresponding to the level of chance, by applying a bootstrap procedure based on 1,000 bootstrap samples with the IBM SPSS software v.19. Following the statistical recommendations of the American Psychological Association (2010), we estimated parameters and ESs with corresponding 0.95 confidence intervals (CIs).
The results of the differences with an expected mean equal to five are presented in Table 3.
With this new statistic, we obtained a confirmation of the results with the exact binomial test: pupil dilation predicted alerting sounds above chance whereas neutral sounds were predicted at chance level.

Bayes Factor (BF)
To obtain further information about the strength of evidences observed with the statistics based on a frequentist model, the exact binomial test and the bootstrapped one-sample t test, we chose to analyze the results using a Bayesian approach to compare directly the odds of the probability of the alternative versus the null hypothesis, given the data observed.
The BF is a model selection criterion that provides the amount of evidence in the data in favor of Model H1 against Model H0. If BF 10 > 1, then Model H1 receives more evidence from the data than Model H0. For example, if BF 10 = 3.0, there is 3 times more evidence in the data in favor of Model H1 in comparison with Model H0.
We chose to calculate both the scaled JZS (Jeffreys, Zellner, Siow) BF 10 (Rouder, Speckman, Sun, Morey, & Iverson, 2009), of the t-test value using the calculated ES and the BF 10 of the binomial test, using the online software implemented by Rouder (2011

Expectation Bias
This bias is based on the expectation that the likelihood of an arousing stimulus being presented grows as the number of consecutive calm stimuli (number of lags) increases (the Gambler's Fallacy). To control whether or not this bias was adopted more or less consciously, we calculated the correlation between the differences between hits and false alarms of alerting sounds with the different lags from neutral sounds. If this bias was adopted, we should observe a strong correlation between number of lags and hits difference. Given that there were only five lags with at least eight data, we applied a bootstrap analysis to the Spearman's rank correlation coefficient. The resulting correlation was −.90; 95% CI was [−0.11, −1.00]. Even if this result may have been   inflated by the low numerosity of data points, it clearly shows that a negative bias was present in our participants, decreasing the correct prediction of alerting sounds. The analysis to verify whether there were any differences in the prediction accuracy between the first series of 10 sounds and the second one revealed no statistical differences: total hits for the first half of the experiment = 422; total hits for the second half of the experiment = 417.

Discussion
The two aims of this study were first to replicate the ARs observed with HR measures with pupil dilation measures and their differential prediction accuracy with stimuli of different adaptive values, presented randomly.
To test the statistical "robustness" of the results, three different statistics were used. Two were based on the frequentist approach and one was based on the Bayesian approach. The two statistics based on the frequentist approach were the exact binomial test and the one-sample t test. The former is based on the binomial distribution testing of whether the number of correct predictions (hits) may be considered to be above the mean level that could be expected by chance, which in our case was 50%. The latter statistics compare the mean number of hits with the expected mean of five. To ensure a more valid generalization, the results were calculated using a bootstrap procedure.
The third statistic is a BF 10 corresponding to the probability ratio of the alternative hypothesis (that the prediction accuracy of ARs will be above the level of chance) against the null hypothesis (that the prediction accuracy will be the same as the level of chance) given the data observed.
These two statistical approaches converge to support the fact that anticipatory pupil dilation responses predict future alarming events at an accuracy level of around 10% above what can be expected by chance. The BF 10 values support the strength of this effect.
The evidence that our participants were "affected negatively" by the expectation bias supports the hypothesis that the level above chance of correct identification of alerting sounds would have been higher if this bias had not been present.

Practical and Theoretical Implications
Even if only independent replications can support the results of the present study, it seems that our psychophysiological system is wired to predict future events even when they are truly unpredictable. If this is true, this capacity has a great adaptive value in preparing the body to react rapidly to potentially damaging events and it would be interesting to study the presence of this ability in animals.
However, one may wonder whether this capacity of our nervous system can be used consciously, that is, is recognized by the person so using it to prepare to and react to future events.
From a review of studies on ARs that have been used for behavioral measures (Mossbridge, Grabowecky, & Suzuki, 2009;Tressoldi et al., 2009), it emerges that conscious (overt) predictions have not exceeded what can be expected by chance. In other words, even if our psychophysiological system is able to predict random events more than what would be expected by chance, it appears for now that these signals cannot be recognized by people for using them in a conscious way.
With regard to previous investigations by Tressoldi et al. (2009Tressoldi et al. ( , 2010, in which the prediction accuracy of anticipatory HR signals was only 6% above the level of chance, the prediction accuracy found through the new procedure used in this study was around 20% above the level of chance. This result could suggest that there is room to increase the accuracy of prediction by devising new prediction algorithms to analyze ARs. For example, HR and pupil dilation together could be used to increase their predictive power, or more sophisticated statistical models for category prediction could be used. If this is verified, it will be a demonstration that this anticipatory predictive ability is an important adaptive tool which is always at our disposal, even though it operates at an unconscious level. It is not too futuristic to hypothesize the possibility of creating pocket devices that would analyze our ARs and could send warnings which could be perceived consciously.
We hope that others will pursue this line of research, which combines neurobiology and human consciousness, to test its validity.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.