Meeting another's gaze shortens subjective time by capturing attention

Gaze directed at the observer (direct gaze) is an important and highly salient social signal with multiple effects on cognitive processes and behavior. It is disputed whether the effect of direct gaze is caused by attentional capture or increased arousal. Time estimation may provide an answer because attentional capture predicts an underestimation of time whereas arousal predicts an overestimation. In a temporal bisection task, observers were required to classify the duration of a stimulus as short or long. Stimulus duration was selected randomly between 988 and 1479 ms. When gaze was directed at the observer, participants underestimated stimulus duration, suggesting that effects of direct gaze are caused by attentional capture, not increased arousal. Critically, this effect was limited to dynamic stimuli where gaze appeared to move toward the participant. The underestimation was present with stimuli showing a full face, but also with stimuli showing only the eye region, inverted faces and high-contrast eye-like stimuli. However, it was absent with static pictures of full faces and dynamic nonfigurative stimuli. Because the effect of direct gaze depended on motion, which is common in naturalistic scenes, more consideration needs to be given to the ecological validity of stimuli in the study of social attention.


Introduction
One essential function of the cognitive system is to analyze intentions in social interactions. As social primates, we constantly monitor the behavior of others so that we can respond appropriately. This ability depends on information carried by the face and the eyes. For instance, eye gaze directed away from us (also referred to as averted gaze), can signal the presence of potentially rewarding or threatening stimuli in the environment, providing the basis for joint attention (Frischen, Bayliss, & Tipper, 2007;Sweeny & Whitney, 2014). Conversely, eye gaze oriented toward oneself (also referred to as direct gaze) may signal approach or appraisal and is known to automatically trigger specific sets of cognitive reactions, such as capturing visual attention and enhancing memory (for review, see George & Conty, 2008;Hamilton, 2016). In animals, the response to direct gaze is an increase of arousal, mainly because eyes are seen as warning signal (Emery, 2000).
Thus, the attentional capture and arousal accounts agree that gaze contact is a highly important social stimulus that is preferentially processed by the human brain. However, they diverge fundamentally concerning the cause of the preference. Several models addressing effects of direct gaze (Conty et al., 2016;Senju & Johnson, 2009b) neglected the effects of direct gaze on arousal. However, recent evidence suggests that eye contact elicits a positive affective reaction (for review see Hietanen, 2018), associated with increased arousal. Direct gaze may indicate attention to others and social inclusion (Wirth, Sacco, Hugenberg, & Williams, 2010), which may cause direct gaze to take precedence over other gaze directions. However, a direct test between the attentional capture and arousal account is missing, as studies providing evidence for one account or the other never used the same task and stimuli. Therefore, we investigate the impact of eye gaze on time perception as the two accounts make opposite predictions for the same experimental protocol.
From our daily experience, we know that the perceived length of time varies depending on the situation. For example, an exciting film makes us feel that time passes quickly, while boring or unpleasant situations make us feel that time passes slowly. "Time flying" and "time dragging" have been demonstrated and thoroughly investigated in humans (Droit-Volet & Gil, 2009;Droit-Volet, Tourret, & Wearden, 2004;Droit-Volet & Wearden, 2002;Gil & Droit-Volet, 2012). According to internal clock models, the estimation of time relies on pacemakeraccumulator processes, where a pacemaker is emitting pulses at a given rate and subjective duration is determined by the accumulation of pulses (Gibbon, 1977;Gibbon, Church, & Meck, 1984;Treisman, 1963). The accumulation of the pulses is managed by a "switch", which monitors the number of pulses emitted by the pacemaker. The models postulate that increased levels of physiological arousal speed up the rate of temporal pulses, resulting in a lengthening of subjective duration (Droit-Volet & Meck, 2007). According to Zakay and Block (1996), the functioning of the switch requires attentional resources. When attention is distracted from time perception, the switch cannot function properly. As a result, time pulses are lost, and time is perceived as shorter because fewer time pulses are accumulated. Thus, time perception theory allows disentangling attention and arousal, as opposite effects on the estimation of duration are predicted. Arousal leads to the overestimation of time by increasing the pulse-generating speed of the pacemaker, whereas attentional capture leads to the underestimation of time by degrading the function of the switch (Zakay & Block, 1996). Previously, it has already been noted that threatening faces result in the overestimation of time , suggesting that time perception may be a sensitive marker of attentional capture or arousal.
If attention was captured by direct gaze, we predict that it would shorten the perceived time of direct relative to averted gaze. However, previous studies using static pictures failed to support this prediction Jusyte, Schneidt, & Schonenberg, 2015;Thönes & Hecht, 2016). For instance, Thönes and Hecht (2016) found that time perception with neutral faces did not change as a function of gaze direction. In their Experiments 2 and 3, the authors used a temporal bisection task (Church & Deluty, 1977;Wearden, 1991) with durations around 1 s. In the training phase of the temporal bisection task, the shortest and the longest intervals were presented several times. Then, intermediate intervals were presented and had to be categorized as being closer to the short or long duration. Using this method, the authors found an effect of direct gaze on temporal sensitivity, but not on perceived duration. Importantly, only static stimuli were investigated.
However, gaze is rarely static in natural situations, and information for non-verbal communication is often conveyed via dynamic changes of gaze. Accordingly, direct and averted gaze are embedded within dynamic gaze behavior and evaluating directional information is complemented by evaluating when shifts occur, in what order and how long they last (Binetti, Harrison, Mareschal, & Johnston, 2017). Recent studies have demonstrated the critical role of dynamic/realistic gaze interactions for the direct gaze effect (e.g., Conty, N'Diaye, Tijus, & George, 2007;Latinus et al., 2015;Naples, Wu, Mayes, & McPartland, 2017;Stephani, Kirk Driller, Dimigen, & Sommer, 2020). In the same vein, some studies revealed that only genuine direct gaze resulted in attentional capture or enhanced facial encoding whereas static images did not (e.g., Hietanen et al., 2008;Ponkanen, Peltola, & Hietanen, 2011). The use of dynamic stimuli might seem challenging in time perception studies, especially when judging short durations, because the gaze shift may lack clear on-and offsets. However, rendering gaze shifts as apparent motion stimuli with only two frames allows for dynamic and more ecological stimuli, which nonetheless have clear on-and offsets. In this series of experiments, we used the same stimulus durations as in Thönes and Hecht (2016). Short durations below 3 s are realistic and appropriate in social interaction (Binetti, Harrison, Coutrot, Johnston, & Mareschal, 2016), and have also been shown to activate the arousal system (Helminen et al., 2011). Therefore, they are ideally suited to decide between the arousal and attentional capture hypotheses. Further, short and long durations may involve different processing mechanisms (Penney & Vaitilingam, 2008). Processing of intervals shorter than a few seconds is sensory-based and implies automatic processing, whereas the processing of longer intervals, from a few seconds to a few minutes, requires cognitive resources (Hellstrom & Rammsayer, 2004;Lewis & Miall, 2003;Rammsayer & Lima, 1991).
In sum, the attentional capture and the arousal accounts make opposite predictions for gaze shifts of short duration. The "attentional capture" hypothesis states that attentional capture degrades the functioning of the switch and results in fewer samples in the accumulator. As a result, time is underestimated for direct compared to averted gaze. Conversely, the "arousal" hypothesis states that direct gaze increases arousal so that the accumulator receives more pulses from the pacemaker, resulting in longer perceived duration. While these predictions apply to dynamic and static stimuli, previous failures to find effects of direct gaze on subjective time Jusyte et al., 2015;Thönes & Hecht, 2016) suggest that more realistic, dynamic stimuli are necessary to bring the effect to the fore.

Experiment 1: Full face
The experimental session was divided into a training and a test phase. Training started with the presentation of the shortest and the longest intervals (anchor stimuli). Then, participants were asked to discriminate between the anchor stimuli. During training, a non-face stimulus was used. In the following test phase, participants were required to classify intermediate durations as being more similar to the short or to the long anchor duration. The orientation of gaze was initially intermediate (see Fig. 1). Then, gaze was shifted so it was clearly perceived as direct or averted. After a variable duration, gaze was shifted back to the intermediate orientation. Participants were asked to classify the duration of the variable interval as short or long.

Participants
Twenty-two first-year psychology students participated in the experiment for class credit (four male, mean age = 21.6 ± 3.9 years). All had normal or corrected-to-normal vision and were naive as to the aim of the experiment. The study received clearance from the local ethics committee (Faculty of Psychology and Educational Sciences, Geneva University). All participants gave written informed consent in accordance with the Declaration of Helsinki.

Apparatus and stimuli
Participants were tested in a dimly lit room. All instructions and stimuli were presented on a 24-in. flat-panel screen (Phillips 242G5DJEB), with a resolution of 1680 × 1050 pixels at a display rate of 60 Hz and a color depth of 32 bit. Stimulus presentation, timing, and data collection were controlled by E-prime 2 (Psychology Software Tools, Pittsburgh, PA).
Stimuli consisted of 40 face stimuli (20 men and 20 women) which were selected from a database of digitized color portraits of young adult faces collected by George and colleagues (e.g. Burra, Baker, & George, 2017;Burra, Framorando, & Pegna, 2018;George, Driver, & Dolan, 2001;Latinus et al., 2015;Vuilleumier, George, Lister, Armony, & Driver, 2005). All faces had a neutral expression and were unknown to the participants. Portraits were taken under the same lighting and viewpoint conditions. The direction of the eyes varied between straight toward the observer (i.e., the camera) or averted by 30 • . We used an additional photograph with 15 • gaze deviation, created by morphing the 0 • and 30 • pictures (for the same procedure, see Conty et al., 2007;Latinus et al., 2015;Puce, Smith, & Allison, 2000). Gaze deviation by 15 • is often perceived as intermediate gaze deviation, whereas gaze deviated by 30 • is perceived as extremely away from the observer. Only deviated head stimuli were selected to avoid evaluations of gaze direction based on symmetry alone. We also conducted a supplementary experiment using frontal pictures, which replicated the results of Experiment 1 (see Supplementary Experiment 2). To avoid any unintended differences in the stimuli, the eye region from the averted gaze stimuli was cut and pasted into the very same position within the face photograph used for the direct gaze stimuli. The stimuli were presented in the center of the screen and covered a visual angle of 13.5 • x 16 • (horizontal x vertical).
The experiment was a time-bisection paradigm, adapted to measure the perceived duration of face stimuli Jusyte et al., 2015). The experiment comprised two training blocks and seven experimental blocks. Trials in the training blocks started with the presentation of a fixation cross for 1000 ms, followed by the presentation of a non-social stimulus (a rectangle of 8 • x 6 • ) for either 986 ms (short) or 1479 ms (long). In the first training block, participants were familiarized with the durations. To this end, we presented 16 long durations followed by 16 short durations. In the second training block, we presented 16 long and 16 short durations in random order. Participants were instructed to indicate whether the presentation duration was long or short by pressing "1" on the numerical keypad of a standard keyboard for short and "2" for long. Feedback was given in case of an incorrect response. The training block was repeated until accuracy reached 80%.
After the training block ended, participants started the main experiment where face pictures were presented instead of squares. Each trial in the experimental blocks started with a fixation cross in the screen center for 750 ms, followed by the picture of a face with gaze deviated by 15 • for 500 ms. Then, the same face was shown but with gaze moved by 15 • toward or away from the observer (reaching gaze directions of 0 • and 30 • , respectively). Presentation time of the second picture was randomly 986, 1054, 1122, 1207, 1292, 1377 or 1479 ms. Finally, the initial picture with gaze deviated by 15 • was presented again for another 500 ms. After a black screen of 1000 ms, participants were instructed to Fig. 1. Illustration of experimental stimuli and procedure in Experiments 1-6. Participants were required to evaluate the duration of the gaze shift. Gaze could be oriented toward or away from the observer. Symbols in the upper part show experiments where we found an underestimation of subjective time for direct gaze. Experiment 5a was omitted for lack of space, but stimuli and results were similar to Experiment 5b. classify stimulus duration as short or long. To control for lateral asymmetries, half of the pictures were left-right flipped. Participants were instructed to respond as accurately as possible without speed pressure.
After the experiment, the Liebowitz Social Anxiety scale (LSAS; Liebowitz & Klein, 1991) in its French version (Yao et al., 1999) was used to assess social anxiety, which might modulate direct gaze perception (Moukheiber et al., 2010;Schneier, Pomplun, Sy, & Hirsch, 2011;Schneier, Rodebaugh, Blanco, Lewin, & Liebowitz, 2011). In the questionnaire, participants rated how anxious they would feel in specific situations and how often they would avoid it, always on a 4-point scale.

Analysis
For each subject, we plotted the proportion of "long" responses as a function of stimulus duration and gaze direction. Using the Statistics Toolbox in Matlab, we fitted a psychometric model using a logistic function in order to determine the point of subjective equality (PSE, the parameter a) and the slope value (the parameter b) for each subject and gaze condition. Fig. 2 shows example data from one subject. A shift of the psychometric function to the left with respect to 1207 ms (i.e., a shorter PSE) indicates overestimation of duration, because a shorter physical stimulus is perceived as equally long as the midpoint between the anchors. In contrast, a shift of the PSE to the right (i.e., a longer PSE) indicates an underestimation of duration, because a longer physical stimulus is perceived as equally long as the midpoint between the anchors. Additionally, a steeper slope (i.e., a smaller slope value) indicates better temporal discrimination than a shallower slope (i.e., a larger slope value). The PSEs and the slope values were analyzed by means of two pairedsamples t-tests as well as Wilcoxon Signed-Ranks tests when the data was not normally distributed, as indicated by the Shapiro-Wilk statistic. As the results from the nonparametric tests were not different from the parametric tests, we did not report them for brevity. Additionally, we conducted two-sided t-test against the average duration of 1207 ms to test for under-or overestimation. For the influence of gaze direction on PSE and slope, we report Cohen's d z (Lakens, 2013) as a measure of effect size in a repeated-measures design. Further, we included the Bayesian factor for each of the comparisons because Bayesian testing is particularly beneficial for evaluating the null hypotheses. We excluded datasets where the goodness of fit of the psychometric function was worse than R 2 = 0.60. Statistical analyses were performed using JAMOVI 1.2.22 (project, 2019). Bayesian statistics were computed using JASP (Team, 2019).

Discussion
We confirmed that directing gaze toward an observer impacted time perception. Time was underestimated by 33 ms when gaze was directed toward the observer. Reductions of perceived time are predicted by the attentional account. In addition, we conducted an internet-based experiment with 77 participants (see Supplementary Experiment 1) to confirm the estimated effect size in Experiment 1. According to a power analysis of Supplementary Experiment 1, 22 participants are required to detect the effect (effect size: d z = 0.73, power = 0.9, alpha = 0.05, twotailed), which corresponds to the sample size in Experiment 1. In sum, the underestimation of time showed that direct gaze resulted in attentional capture. In contrast, the results are at odds with the idea that direct gaze increases arousal because arousal predicts an overestimation of perceived duration. Further, the experiment suggests that dynamic stimuli are necessary to induce effects of direct gaze on time perception. A previous study failed to show an effect with static pictures (Thönes & Hecht, 2016), which we will further investigate in Experiment 3.

Experiment 2: Eyes-only
In Experiment 1, we demonstrated that time was underestimated  Note: The significance of mean PSEs for gaze shifts toward and away was evaluated by one-sample t-test against the average duration (1207 ms). Also, we evaluated the significance of the difference between toward and away by paired t-test. Results of the t-tests are indicated by * for p < .05 and ** for p < .01.
when participants evaluated the duration of direct gaze. However, we ignore whether this effect depends on the presentation of the entire face or whether the eyes are sufficient. Isolated eyes are known to be sufficient for the precedence of direct over averted gaze (Conty et al., 2006;Senju, Hasegawa, & Tojo, 2005;von Grunau & Anston, 1995). However, removal of the face context might eliminate cues from head orientation, which ordinarily contribute to the perception of gaze direction, especially when the head is deviated (Jenkins & Langton, 2003;Langton, Honeyman, & Tessler, 2004). Therefore, we only presented the eye area to check whether the face context contributed to the temporal distortion.

Participants
Twenty-four students fulfilling the same criteria as above participated in the experiment. Datasets from two participants were discarded due to poor fits of the psychometric function, leaving twenty-two datasets for final analysis (nine male, mean age = 21.6 ± 3.03 years).

Stimuli and procedure
Stimuli and procedure were as in Experiment 1. The only difference was that only the eye region (eyes + eyebrows) was shown. The stimuli subtended 8 • x 2 • (horizontal x vertical) instead of 13.5 • x 16 • .

Results
The PSE and the slope values in the toward condition were not normally distributed, ps > 0.003. However, all significant results were replicated with a nonparametric Wilcoxon Signed-Rank tests.

Discussion
Experiment 2 replicated the results of Experiment 1. We confirmed that gaze directed toward an observer led to an underestimation of time by 43 ms, as predicted by the attentional capture account. Removing the face context did not eliminate the underestimation of time for direct gaze, confirming that the eyes are sufficient for the precedence of direct over averted gaze (Conty et al., 2006;von Grunau & Anston, 1995).

Experiment 3: Full-face static
Experiment 3 assessed the boundary conditions of the effect and alternative explanations. First, it is possible that changes in time perception were caused by the difference between the stimuli used in our experiments compared to previously used stimuli Jusyte et al., 2015;Thönes & Hecht, 2016). In fact, previous studies used frontal faces, including neutral or emotional faces Jusyte et al., 2015), while others manipulated head direction instead of eye direction (Thönes & Hecht, 2016). To address this issue, we used the same stimuli but removed the apparent motion. If the dynamic shift of gaze accounted for the discrepancy relative to previous research, the underestimation of time should be absent with static stimuli.

Participants
Twenty-four students fulfilling the same criteria as above participated. Two datasets were discarded due to poor fit, leaving twenty-two datasets for final analysis (four males, mean age = 21.61 ± 3.59 years).

Stimuli and procedure
Stimuli and procedure were as in Experiment 1 with the exception that we removed the initial and final picture showing a gaze deviated by 15 • . Instead, we presented a black screen for the same duration.

Discussion
Consistent with previous reports Jusyte et al., 2015;Thönes & Hecht, 2016), Experiment 3 found that static presentation of neutral faces with different gaze directions did not elicit changes in time perception. While Thönes and Hecht (2016) found a small effect on slopes with avatar stimuli, they failed to replicate this effect using realistic pictures, which is in line with the current data. It is likely that stimuli mimicking dynamic social interaction might be required to reveal an effect of direct gaze on time perception.

Experiment 4: Upside-down
While Experiment 3 showed that only dynamic gaze stimuli result in changes of time perception, Experiments 4 to 6 investigated whether motion stimuli alone would cause the changes. To evaluate the role of gaze cues embedded in faces, we changed the stimuli from face-like to abstract stimuli in some experiments. First, however, we used upsidedown faces to disrupt the normal face configuration (Maurer, Grand, & Mondloch, 2002) and to eliminate global in favor of local face processing (Diamond & Carey, 1986;Scapinello & Yarmey, 1970;Yin, 1969). Because face inversion impairs the perception of gaze direction (Jenkins & Langton, 2003;Schwaninger, Lobmaier, & Fischer, 2005), the relative salience of direct gaze and attentional deployment toward the eyes is reduced (Langton & Bruce, 1999), which can eliminate the stare-in the crowd effect . The question was whether the underestimation of time induced by direct gaze would persist despite the reduced saliency of gaze direction.

Participants
Twenty-five students fulfilling the same criteria as above participated. Datasets from three participants were discarded due to poor fit, leaving twenty-two datasets for final analysis (five males/mean age = 20.8 ± 2.5 years).

Stimuli and procedure
Stimuli and procedure were as in Experiment 1 except that the stimuli were displayed upside-down.

Results
PSEs were normally distributed, but not the slope values, p < .001.

Discussion
The effect of gaze direction on time perception remained present with inverted faces. Despite the reduced saliency, the difference between averted and direct gaze persisted. However, both PSEs were different from the average duration of 1207 ms, which was not observed in the previous experiments. That is, there was an underestimation both for direct and averted gaze. It is likely that more attention was required to process gaze changes in inverted faces due to the disruption of the surrounding face context (Jenkins & Langton, 2003). However, it is unclear how this effect led to the general underestimation. Importantly, the results suggest a strong role of local processing in the emergence of the direct gaze effect, which is consistent with previous research (Senju, Kikuchi, Hasegawa, Tojo, & Osanai, 2008).

Experiment 5a and b : "Eye-like"
The preceding experiments suggest that the full face contributes little to the underestimation of time with direct gaze. In fact, Experiment 2 showed that only the eye region is sufficient to produce the bias. Experiments 5a and 5b reduced the naturalistic eye stimuli to "eye-like" geometric figures, which preserved the strong contrast between the inner and outer part of the eye (Kobayashi & Kohshima, 2001). In Experiment 5a we presented small black squares inside larger white rectangles, which mimicked the black pupil inside the white sclera. In Experiment 5b, the contrast was reversed, which decreased the similarity to natural eyes but preserved the strong contrast between inner and outer part. We predict that if our effect depended on the face context or naturalistic stimuli, it should be absent or strongly reduced with the geometric stimuli. Conversely, if the effect remained present, only a faint reference to natural eyes (i.e., the strong contrast between inner and outer part) is sufficient to induce the underestimation of time.

Participants
Forty-two students fulfilling the same criteria as above participated. Sixteen participated in Experiment 5a and twenty-six in Experiment 5b. One dataset from Experiment 5a and three datasets from Experiment 5b were discarded, leaving fourteen datasets in Experiment 5a (4 males, mean age = 21.1 ± 1.26 years) and twenty-two in Experiment 5b (5 males, mean age = 22 ± 1.3 years).

Stimuli and procedure
A large gray rectangle (RGB: 109, 109, 109; size of 13.5 • x 16 • , horizontal x vertical) was shown in the center of the black screen. The eye-like stimuli were horizontally and vertically centered inside the gray rectangle. In Experiment 5a, the eye-like stimuli consisted of two black squares inside two larger white rectangles, which corresponds to the natural contrast. In Experiment 5b, the squares were white, and the surrounding rectangles were black, which is the opposite of the natural contrast (see Fig. 1). Because the background was always gray, the only difference between Experiment 5a and 5b was the switch of contrast inside the eyelike shape. The sizes of the squares and rectangles were 2.5 • x 2.5 • and 3.5 × 2.5 • , respectively. The rectangles were 1.7 • apart (edge-to-edge). Initially, the squares were offset by 0.4 • to the left of the center of the rectangles. Then, they moved 0.2 • to the left to create an eye-like gaze shift away from the participant or they moved 0.2 • to the right to create an eye-like gaze shift toward the participant. The apparent motion of the squares was comparable to the apparent motion of the pupils in the preceding experiments (~0.2 • ). In half of the trials, the pictures were left-right flipped. After the same variable time interval as in the preceding experiments, the smaller squares moved back to their original position. We carefully avoided talking about faces or eyes before or during the experiment.

Discussion
Experiment 5a and 5b reduced natural eye stimuli to eye-like geometric shapes, but kept the strong contrast between the inner and outer part of the eyes (Kobayashi & Kohshima, 2001). The effect of direct gaze on time perception persisted, showing that naturalistic stimuli are not necessary to induce the effect. Even geometric shapes with inverted contrast produced the underestimation of time, showing that stimuli with only a slight resemblance to natural eyes are sufficient. These results are consistent with the multiple renditions of the eyes in the digital age. In many of these depictions, only two points are used to indicate the eyes. Alternatively, the presence of a gray background might have produced a contour similar to a face, which may have primed the perception of eyes (Bentin, Sagiv, Mecklinger, Friederici, & von Cramon, 2002). Nevertheless, stimuli with reversed contrast in Experiment 5b produced the same effect, which is surprising because reversed-contrast faces are known to degrade many aspects of gaze-related processing in static presentations. Nonetheless, our renditions of the eyes always included the two major parts of the eyes, the pupil and the sclera, which may be sufficient given the highly developed ability of humans to perceive the "eyeness" of a stimulus, even in rather ambiguous situations. Consistently, a small dot surrounded by a circle triggers neurons sensitive to the eyes, while the constituent circle and dot fail to do so (Perrett, Rolls, & Caan, 1982).
Further, inverted contrast (i.e., white inner part surrounded by black outer part) is admittedly rare, but it does exist in at least one species, the ruffed lemur (Kobayashi & Kohshima, 1997). Despite the negative contrast, ruffed lemurs can follow the gaze direction of their conspecifics (Ruiz, Gomez, Roeder, & Byrne, 2009), suggesting that the eye-like shape is more important than contrast polarity. Further, Ruiz et al. suggest that inverted contrast might facilitate the accurate discrimination of gaze direction at night. Overall, the underestimation of time may originate from the movement of a highly contrasted object with key features of the human eye (pupil and sclera), regardless of contrast polarity. Nevertheless, an alternative explanation could be that the misestimation of time is an artifact arising from motion and is not related to the perception of gaze. In order to rule out this explanation, we conducted further experiments.

Experiment 6a : "Pupil-like" disks
To find the boundary conditions of the effect of direct gaze on time perception, we reduced the schematic gaze stimuli even further. In Experiment 6a, we used black disks similar to pupils, but without any contour to indicate the sclera. While there was motion toward and away from the center, the resemblance to a face was reduced by using a horizontal line as a context for the apparent motion.

Participants
Twenty-five students fulfilling the same criteria as above participated. Two datasets were discarded because of poor fit, leaving twentytwo participants in the final analysis (4 male, mean age = 22.1 ± 3.47 years).

Stimuli and procedure
The same gray background rectangle was presented as in Experiment 5 and the line was at the same vertical position as the eye-like rectangles in Experiment 5. The line was white and of the same length as the eye region in Experiment 2 (8 • , linewidth 0.3 • ). Two black disks (0.7 • diameter, 4.2 • apart), were presented on the line (at 1.1 • and 5.3 • from the left edge of the line) to produce a "pupil-like" stimulus. Then, the disks were moved by 0.2 • toward the left or right. In half of the trials, the disks were left-right flipped. We carefully avoided talking about faces or eyes before or during the experiment.

Results
The PSE and the slope values in the toward condition were not normally distributed, p < .008, but nonparametric tests yielded similar results. The PSEs did no differ between the disks moving toward or away from the center (1263 vs. 1267 ms), t(21) = 0.48, p = .63, d z = 0.06, BF 01 = 4.34. Both PSEs were larger than the average duration of 1207 ms, ts(21) > 3.09, ps < 0.001, d z > 0.92, BF 01 > 8.18, suggesting that time was underestimated. The difference in slope values between disks toward and away (147 vs. 151 ms) was not significant, t(21) = 0.43, p = .67, d z = 0.09, BF 01 = 4.12.

Discussion
The motion of two disks without a context suggesting the sclera was insufficient to bias time perception. However, PSEs in both conditions were longer than the average duration, indicating that time was generally underestimated. The reasons for this bias are unclear. Finally, as a complement to Experiment 6a, we used a grid stimulus (Conty, Gimmig, Belletier, George, & Huguet, 2010;von Grunau & Anston, 1995) to remove localized objects while maintaining global motion toward and away from the center.

Participants
Twenty-four students fulfilling the same criteria as above participated. The datasets from two participants were discarded due to poor fit, leaving twenty-two participants for final analysis (four male, mean age = 20.5 ± 2.24 years).

Stimuli and procedure
The same gray background rectangle as in Experiment 5 was used and the grid stimulus was shown at the same vertical position inside the rectangle. The grid measured 11 • x 3.2 • (horizontal x vertical). The grating was composed of 4 vertical black bars (~1.2 • width), which alternated with 5 Gy bars (~1.2 • width, RGB: 122,122,122). Initially, the center of the grating (a black bar) was 0.4 • to the left of the center of the screen. The grating was then shifted by 0.2 • toward or away from the screen center. After a variable duration, the grating was shifted back to the initial position. In half of the trials, the grating was left-right flipped.

Discussion
Experiment 6b isolated apparent motion toward and away from the center without any localized objects by means of a square-wave grating. Similar to Experiment 6a, there was no effect of dynamic change on time perception. Thus, apparent motion toward the center is insufficient to produce the direct gaze effect (Conty et al., 2010;von Grunau & Anston, 1995). Taken together, apparent motion of abstract eye-like stimuli may induce changes in time perception (see Experiment 5), but when the similarity to the eyes is strongly reduced (Experiment 6a) or absent (Experiment 6b), apparent motion toward or away from the center is insufficient to generate the bias in time perception.

Comparison between experiments
We assessed differences in age and social anxiety across experiments by one-way ANOVA. Participants did not differ in age, F(7, 66.5) = 0.73, p = .64, or social anxiety, F(7, 66.9) = 0.72, p = .65. Further, we sought to provide statistical evidence for the disappearance of the temporal bias with non-social abstract stimuli (Experiments 6a and 6b) compared to social realistic eye shift (Experiments 1 and 2) by running a mixed-factor 2 (motion: toward, away) x 4 (experiment: 1, 2, 6a, 6b) ANOVA. There was an interaction of gaze direction and Experiment, F(3, 84) = 3.30 p = .002, ƞ p 2 = 0.166, showing that the difference between gaze directed toward and away from the observer was abolished in Experiments 6a and 6b. The same analysis was conducted on slope values, but no effects were significant, ps > 0.53. Finally, the goodness of fit of the psychometric functions, as reflected in the coefficient of determination (R 2 ), did not differ between experiments and was not modulated by experimental condition, ps > 0.6.

General discussion
Direct gaze is considered a highly important social stimulus that is preferentially processed by the human brain. However, whether effects of direct gaze on cognitive processing are due to attention or arousal remains under discussion. To disentangle contributions of attention and arousal, we investigated how time perception, a critical ability for many cognitive behaviors (Walsh, 2003), was impacted by direct gaze. Using a temporal bisection task with short durations (< 3 s), we established that shifts of gaze toward the observer resulted in the underestimation of time relative to shifts of gaze away from the observer. In all experiments with a difference between direct and averted gaze, subjective time with direct gaze was underestimated relative to the average time interval.
The underestimation suggests that direct gaze captures attention. According to models of time perception, attention is critical in monitoring the number of time pulses (Gibbon, 1977;Gibbon et al., 1984;Treisman, 1963) and attentional capture by direct gaze would induce lapses in the count of the time pulses, which decreases the count of pulses and leads to an underestimation of time. In contrast, the underestimation of time is not consistent with the alternative account that direct gaze increases arousal because increased arousal is thought to increase the rate of pulse generation. With more pulses for the same interval, an overestimation of time is predicted, but we found the opposite.
What might be the reason for the temporal distortion with dynamic eyes? The underestimation of time with direct gaze may confer an adaptive advantage because it may prolong social interactions, which is beneficial for social species like humans. In fact, shifts of the eyes might be related to the tendency of approach versus avoidance in social contexts. Using a similar temporal bisection task, an underestimation of time has also been observed with stimuli involving a high motivation to approach (e.g., a treat) compared to stimuli involving a low motivation to approach (e.g., a flower) (Gable & Poole, 2012). Thus, direct gaze may be considered an approach signal, which is consistent with the suggestion that it leads to prolonged social interaction.
Further, we demonstrated that even abstract, but eye-like stimuli are sufficient to elicit an underestimation of time, which implies that a minimal social context is sufficient. Possibly, the unique shape of the eyes and the high contrast between its constituent parts (Kobayashi & Kohshima, 2001) facilitates the association of relatively abstract stimuli with eye gaze, which may explain why minimally eye-like stimuli had the same effect as pictures of real eyes, which mirrors findings from single-cell recordings (Perrett et al., 1982). A way to test for the role of associations in long-term memory between abstract stimuli and the eyes would be to prime observers to perceive equally abstract stimuli either as eyes or some unrelated object.
While favoring the attentional account, our study does not rule out that eye contact impacts the arousal system in other contexts (for a review, see Hietanen, 2018). In related research, however, the impact of attention was larger than the impact of arousal. For instance, it has been shown that eye contact of 1500 ms elicits bodily self-awareness in human adults, but the contribution of arousal was small because skin conductance did not differ between direct and averted gaze despite behavioral facilitation with direct gaze (Baltazar et al., 2014). We argue that in the context of short gaze interactions, which are more realistic in everyday life, gaze contact might trigger attentional capture more easily than increases in arousal, but more research is needed to understand the impact on the arousal system. For instance, studies measuring skin conductance are missing. Overall, however, the current study is more in line with models describing the effect of direct gaze and eye contact on cognitive and not affective processing (Conty et al., 2016;Senju & Johnson, 2009b), with a strong contribution of task demands (Burra, Mares, & Senju, 2019;Riechelmann, Gamer, Bockler, & Huestegge, 2021).
In a broader perspective, it may be interesting to understand how time perception is affected by gaze in clinical populations showing abnormal social interaction. For instance, it may be that the underestimation of time with direct gaze is absent in individuals who exhibit dysfunctional social attention, such as in autism (for a review, see Senju & Johnson, 2009a), schizophrenia (e.g., Gordon et al., 1992;Loughland, Williams, & Gordon, 2002) or social anxiety (e.g., Moukheiber et al., 2010;Schneier, Pomplun, et al., 2011;Schneier, Rodebaugh, et al., 2011). In these clinical populations, it may be interesting to compare dynamic stimuli that vary in their resemblance to real eyes to estimate the impact of social factors.
Further, recent evidence revealed a similar distortion of time induced by gaze, but on a different time scale. Jarick, Laidlaw, Nasiopoulos, and Kingstone (2016) explored the prospective estimation of 1min intervals with a real person looking toward or away from them or having closed eyes. Overall, the perceived duration of the social interaction was underestimated, and differences emerged between direct gaze relative to averted gaze and closed eyes. Consistent with internal clock models of time perception, the authors attributed the modulation of time perception by direct gaze to attentional capture by real social interaction.
However, the use of relatively extended social interaction might be an important limitation of the study by Jarick et al. (2016). In fact, emotion-related arousal effects have been shown to operate on short durations in the range of milliseconds to seconds while attentional effects may affect broader time ranges (Lake, LaBar, & Meck, 2016;Mella et al., 2019;Mella, Conty, & Pouthas, 2011;Noulhiane, Mella, Samson, Ragot, & Pouthas, 2007). Therefore, the current study provides a more adequate test between the attentional capture and arousal accounts. Moreover, relative to the duration of direct gaze in more natural situations (Argyle, 1981;Mobbs, 1968), the presentation time in Jarick et al. (2016) was unusually long. In general, the preferred duration of gaze contact is between 3 and 5 s (Argyle & Cook, 1976;Argyle & Ingham, 1972;Binetti et al., 2016). As a result, participants might not have maintained eye contact for the presentation time of 1 min, but we cannot know for sure because eye movements were not monitored. Additionally, participants probably adopted a segmentation strategy where the long-time intervals were divided into smaller intervals. That is, participants in Jarick et al. (2016) may have segmented temporal information or may have used strategies such as foot tapping, imaging, repetitive movements, or counting seconds (Grondin & Killeen, 2009;Grondin, Ouellet, & Roussel, 2004;Hinton & Rao, 2004). In contrast, we used time intervals close to what is common in natural social interaction, where sensory-based or automatic processing of direct gaze is isolated.
While the pacemaker-accumulator model allows for a suitable interpretation of the data, we have to admit that the model is purely theoretical. In fact, the idea of a clock-like counter, despite having reasonable face validity, has found little physiological support. Nonetheless, it is easy to find links between time perception and attention in the neurophysiological literature. Time perception is thought to be controlled by the dopaminergic system (Soares, Atallah, & Paton, 2016), including the ventral striatum, the prefrontal cortex as well as the amygdala (Ross & Peselow, 2009). Interestingly, the dopaminergic system is also critical in attentional processing (Nieoullon, 2002), especially when stimuli of high-value are involved. For instance, attentional capture by a stimulus with high priority increases activity of the dopaminergic system (Anderson et al., 2017; for a review, see Todd & Manaligod, 2018). Similarly, direct gaze, which is a stimulus of high social value, is processed in key regions of the dopaminergic system like the amygdala (Burra et al., 2013;Calder et al., 2002;George et al., 2001;Kätsyri et al., 2020;Kawashima et al., 1999) or the ventral striatum (Kampe, Frith, Dolan, & Frith, 2001). Further, physiological studies suggest that direct gaze increases dopaminergic activity due to its high social value, resulting in approach motivation (Bromberg-Martin, Matsumoto, & Hikosaka, 2010) or in increased attention (Nieoullon, 2002). Together, this activity may produce an underestimation of time. Thus, the overlap of brain circuits might explain how direct gaze and attention interact. However, further neurophysiological studies are required to investigate the proposed link.
Overall, we do not question that both attention and arousal affect time perception. However, our study demonstrates that when our eyes meet the eyes of another person, or even eye-like stimuli, the attentional system is more impacted than the arousal system. We perceive episodes of direct gaze contact as longer than episodes of averted gaze. Possibly, the underestimation of time prolongs social interactions, which may be beneficial for a social species like humans. Critically, we show that only dynamic shifts of gaze have an effect, whereas time perception was not distorted with static pictures. While the difference between dynamic and static gaze suggests that it is important to investigate more naturalistic stimuli, our results also suggest that the nature of the stimuli may be rather abstract as long as some minimal resemblance to natural eyes is maintained.