Introduction

The human pupil reacts to changes in luminance but also reflects cognitive processing. Pupillometry allows us to detect changes in cognitive states or processes such as arousal, interest, cognitive load, attention, and surprise (Laeng, Sirois, & Gredebäk, 2012). A very tight relationship between pupil dilation and the activity of the locus correleus (LC) has been observed (Koss, 1986; Samuels & Szabadi, 2008). The LC is a subcortical structure that plays an important role in stressful or high-attention situations, and it is the only source of norepinephrine in the brain, a neurotransmitter that stimulates the iris dilator muscle (Sterpenich et al., 2006).

Pupil dilation is an involuntary response and is relatively easy to measure and interpret. Therefore, it is an excellent methodological option for studying preverbal participants such as infants. It has been used to assess classical violations of expectation tasks (Jackson & Sirois, 2009; Sirois & Jackson, 2012), individual differences in face processing (Gredebäck, Eriksson, Schmitow, Laeng, & Sternberg, 2012), and perception of irrational events by infants (Gredebäck & Melinder, 2010, 2011). Task-elicited changes in pupil diameter have also been studied in adults (see Beatty, 1982, for a review) with, for example, eye saccade tasks (Evens & Ludwig, 2010), Stroop tasks (Brown et al., 1999; Laeng, Orbo, Holmlund, & Miozzo, 2011; Siegle, Steinhauer, & Thase, 2004), and dual tasks (Karatekin, 2004; Karatekin, Couperus, & Marcus, 2004).

Among the systems used in psychological research, Tobii (Tobii Technology AB, 2010) and EyeLink eyetrackers (SR Research Ltd., 2005–2008) are relatively common. These eyetrackers support binocular tracking and use infrared illumination. One or two cameras capture images of the eyes. Data acquisition rate generally varies from 60 to 1000 Hz. Image-processing algorithms are used to determine gaze position and pupil size. Two different setups can be used for illumination: bright or dark pupil. Bright pupil setups use illuminators placed close to the camera, whereas dark pupil is obtained when the illuminator is placed further away. Current Tobii systems automatically proceed with both methods during calibration and choose the one that provides optimal accuracy for recording. The system can adaptively change the method during recording to improve trackability. Since environmental light, age, and ethnicity affect the pupil’s discriminability, a system using both bright and dark methods flexibly is advantageous. Conversely, a system using only dark pupils, such as EyeLink, could provide better tracking stability.

Another difference across eyetrackers is the allowable head movement. Some use either a remote tracking box (i.e., eyes can be successfully tracked within a specific 3-D virtual “box” some distance from the eyetracker) or glasses that allow head movement. Others need a chin and/or forehead resting device to somewhat immobilize the head at a fixed, restricted tracking position. Such systems do not cope well with head movement. The former gives more freedom in movements, whereas the latter is more accurate. Systems are also different concerning the size of the screen, the sampling rate, and portability. Embedded eyetrackers can be combined with a monitor arm in order to be easily adapted for infancy research (i.e., by moving the eyetracker rather than the baby). On the other hand, external box tracking allows the presentation of stimuli on a wider screen or in real 3-D space, which could be interesting for experiments including virtual reality, for example. Depending on the population, testing duration, or the need for a natural environment, one system may be more preferable (Morimoto & Mimica, 2005).

Advances in eye-tracking technology increase opportunities for future studies (Aslin, 2007), allowing us to measure anticipation (Bakker, Kochukhova, & von Hofsten, 2011; Falck-Ytter, Gredebäck, & von Hofsten, 2006; Gredebäck & Melinder, 2010; von Hofsten, Uhlig, Adell, & Kochukhova, 2009), the microstructure of gaze (Yu, Yurovsky, & Xu, 2012) such as saccades (Lavergne, Vergilino-Perez, Collins, & Doré-Mazars, 2010; Lavergne, Vergilino-Perez, Collins, Orriols, & Doré-Mazars, 2008), and complex visual behaviors. However, these advantages come with technical challenges, like the normalization of calibration, missing values, coordination/in-tegration with other data acquisition environments, and data interpretation (Oakes, 2012).

For example, Morgante, Zolfaghari, and Johnson (2012) measured the temporal and spatial accuracy of the Tobii T60XL eyetracker. They report temporal errors of up to 54 ms and spatial errors of up to 1.27°. Wyatt (2010), using ISCAN EC-101, assessed the position of the center of the pupil as varying up to some tenths of a millimeter during dilation and constriction of the pupil. This change also varies in size and direction between participants and even between a participant’s eyes. The error in measuring eye’s position can vary up to 1.22° on the horizontal axis and 0.85° on the vertical axis. A study about sentence reading (Gagl, Hawelka, & Huzler, 2011) found a systematic influence of gaze position in the horizontal axis and pupil dilation estimation by a video-based eyetracker (EyeLink 1000). Such an uncorrected error could significantly alter findings and their interpretation.

The present study systematically examined the relationship between gaze point and pupil diameter with a low-level cognitive task of constant luminance. Participants were asked to visually track an object as it moved on a monitor in a regular, circular, predictable pattern. In such conditions, task demands were very low, and the task could be achieved at constant luminance. There should be no systematic changes in pupil diameter as a function of gaze if eyetrackers provide a reliable, position-independent estimate of pupil size.

Method

Participants

A sample of 44 adults (20 male, 24 female; between 20 and 55 years of age) participated in this study. All had normal, uncorrected vision. Twenty-three participants were recruited at Université du Québec à Trois-Rivières (Canada), and 21 participants were recruited at Paris Descartes University (France). One participant was excluded from analyses due to misunderstanding task instructions.

Apparatus

For Canadian participants, gaze and pupil data were collected using two Tobii eyetrackers: an X120 model (Tobii Technology, Stockholm, Sweden) positioned beneath a 60 × 34 cm presentation monitor (1,920 × 1,080 pixel resolution; 60-Hz refresh rate) and a T120 model (Tobii Technology, Stockholm, Sweden) equipped with an integrated 34 × 28 cm screen (1,280 × 1,024 pixel resolution; 60-Hz refresh rate). Twelve participants viewed the experiment on the X120 first; 11 were tested on the T120 first. Eyetrackers were located in soundproofed cubicles; experimental equipment was operated from outside the cubicles.

For French participants, data were collected using an EyeLink 1000 (SR-Research, Kanata, Canada).This device can sample at 2000 Hz and allows a 1000-Hz binocular sampling rate. A chin and forehead rest was used to stabilize the participants’ heads. Participants were seated 60 cm away from a 21-in. (53.34-cm) CRT screen (200-Hz refresh rate). Pupil diameter was assessed in a centroïd pupil-tracking mode with a binocular setup (25-mm lens, 1000 Hz).

Trackers were used as instructed by the manufacturers. Tobii systems offer a head movement box of 30 × 22 cm at 60 cm of distance, while the EyeLink 1000 Desktop Mount requires placing the chin and forehead on a structure limiting head movement to 25 × 25 mm (compensated by the fact that movement is minimized with the use of the chin and forehead rest).

Events

The stimulus was a 30-s movie of a blue circle over a gray background moving counterclockwise at 12 rotations per minute in an elliptical trajectory filling the whole screen. The movie thus showed six complete rotations of the blue circle. Different sizes of the movie were created to accommodate the two different screen resolutions used. The elliptical path of the ball for the Tobii X120 setup covered 22° of horizontal visual angle and 14° of vertical visual angle, whereas the path for the Tobii T120 and EyeLink setups was 13° of horizontal visual angle and 10° of vertical visual angle. Figure 1 shows the first frame of the video for each of the two screen resolutions. Luminance was 60.8 cd/m2 for Tobii X120 and 38.2 cd/m2 for Tobii T120. Luminance for the Eyelink system was 26.5 cd/m2. Luminance remained constant throughout the movie.

Fig. 1
figure 1

Screen captures for the wide (Tobii X120) and narrow (Tobii T120 and EyeLink) screens used in this study, shown using the same scale to show relative differences

Procedure

Participants sat in front of the screen. The distance to the eyetracker (between 60 and 70 cm) and the eyes’ elevation were verified. After a 5-point calibration procedure, on-screen instructions (5,000-ms duration) told participants to look at the screen and follow the trajectory of a ball, trying not to anticipate it. They then watched a 5,000-ms gray screen and, finally, the 30,000-ms movie. The gray screen before the movie helped avoid luminance-based pupil dilatation or constriction at movie onset.

Eyetrackers recorded the position of the eyes on the x- and y-axis and pupil diameter. This task was made as simple as possible and, thus, minimized cognitive load (which might otherwise reflect on pupil diameter). It also allowed repeatedly assessing pupil diameter over a wide range of x and y positions in a minimum time.

Results

Sample rate of acquisition for Tobii eyetrackers was set at 60 Hz; for the EyeLink system, it was 1000 Hz. Missing values for pupil diameter (e.g., eye blinks) were recorded as −1 on Tobii systems and as 0 on the EyeLink eyetracker. Missing values were problematic and were interpolated before running the analyses, using a method described in Jackson and Sirois (2009). Pupil data from each eye were regressed on each other to fix missing samples when they applied to only one eye. A low-pass filter (15 Hz) was applied to the data, once forward, once backward (to prevent dephasing) in order to remove jittering, which can be substantial immediately before or after missing samples.

Finally, simultaneous data gaps in both eyes were filled linearly between the start and end values around the gaps. Since pupils from both eyes are highly correlated (in our sample, Pearson r of .92, .96, and .94 for the EyeLink, T120, and X120 systems, respectively), mean pupil diameter was computed at each sample, for each participant. As a side effect, using the mean corrects for the odd minor sampling error affecting only one eye (e.g., when eye blinks or head turns make one pupil under- or overestimated, which is inevitable with camera-based pupillometry). Results reported herein were replicated when analyses were performed separately on left and right pupils. For conciseness, we report only on mean pupil analyses.

Although the task used was merely a convenient way to get participants to provide gaze points across a range of horizontal and vertical positions, we assessed their performance by computing the proportion of gaze points that were on (or near) the tracked object over the length of the video. An area of interest (AOI) box with a buffer width and height of 2.20° of visual angle (oVA) beyond the small ball (1.47° oVA) was used. Because of the large width of the screen for the X120 eyetracker, this buffer was increased to a width of 120 pixels. The percentages of valid gaze points that were on the AOI for all participants were 89.99, 90.44, and 86.77 for the T120, X120, and EyeLink eyetrackers. Overall, participants tracked the object well.

Mean pupil diameter for the whole movie was used as an idiosyncratic baseline. The individual baseline was subtracted from corresponding participant data. Since the task does not elicit task-induced changes in the pupil, baseline can be based on the entire length of the task and should be particularly accurate relative to a pretask baseline. For studies that examine task-evoked pupillary responses, a pretrial baseline would be essential, since a within-trial baseline would, at best, mask and, at worst, bias pupil measurement. Data reported and analyzed are relative changes in pupil diameter from individual baseline.

Figures 2 and 3 show mean pupil diameter as a function of time for each eye-tracking system. Sinusoidal curves represent the position of the ball along the x-axis (Fig. 2) and y-axis (Fig. 3) at corresponding times.

Fig. 2
figure 2

Changes in pupil diameter (black) and horizontal position of target (blue) as a function of time for aTobii T120, b Tobii X120, and c EyeLink 1000

Fig. 3
figure 3

Changes in pupil diameter (black) and vertical position of target (blue) as a function of time for a Tobii T120, b Tobii X120, and c EyeLink 1000

Pupil diameter exhibited sinusoidal variation even though the task was of low-level constant arousal and constant luminance. Measured pupil size varied as a function of gaze position (which varied following cosine and sine functions along x and y, respectively). This applied to all three eyetrackers. Pupil diameter seemed to be relatively more affected by horizontal position of gaze for Tobii systems and vertical position for EyeLink equipment. A temporal drift was noticed with the EyeLink system. The pupil diameter increased during the course of recording.

Figure 4 combines pupil diameter with gaze position on the x- and y-axis. This figure clearly illustrates the spatial organization of measured pupil diameter.

Fig. 4
figure 4

Changes in pupil diameter as a function of horizontal and vertical gaze for a Tobii T120, b Tobii X120, and c EyeLink 1000. Pupil diameter is color coded, where the red end of the scale means overestimation and the blue end means underestimation

Functional data analysis (FDA) was used in order to test the significance of estimation error over time (Ramsey & Silverman, 1997; see Jackson & Sirois, 2009, for use with pupillometry). FDA transforms raw data into functions (b splines) that fit data by joining together smaller cubic segments (i.e., y′ = a + bx + cx 2 + dx 3 + e), with the constraint that the end of a segment has the same curvature as the beginning of the next segment (which creates a long smooth curve to fit the data). The analysis was performed on the parameters of these functional curves. The b splines we used to fit the pupil data had 34 free parameters. The analysis is a single-sample t test examining when changes in pupil diameter significantly deviate from baseline. The resulting t test is also a functional curve, which can be plotted over time. Functional t test expresses t over time, rather than providing a discreet value for t, enabling one to assess when (rather than merely whether) changes are significant. Figure 5 shows functional t tests of change in pupil diameter for each eyetracker. Significant differences (when the t function was beyond the critical value) were primarily linked to horizontal gaze in Tobii systems and vertical gaze for the EyeLink system, when the timing of differences in Fig. 5 were compared with pupil changes in Figs. 2 and 3.

Fig. 5
figure 5

Functional t test of change in pupil diameter against baseline as a function of time for a Tobii T120, b Tobii X120, and c EyeLink 1000. Critical values are represented by horizontal lines. Portions of the curves beyond these lines imply a significant change from baseline

We carried out linear regression to further quantify the relationship between point of gaze and pupil diameter. Data samples where the gaze points were deemed to be within the screen area were selected for regressing pupil diameter on horizontal and vertical point of gaze. Pupil diameter was estimated using the function

$$ {P^{\prime }}={b_o}+{b_1}X + {b_2}Y+e $$
(1)

where P’ is the estimated diameter, X and Y are the horizontal and vertical points of gaze, and e is the residual error. The results of regression analyses for all eyetrackers are summarized in Table 1. The regression model explains between 9.90 % and 20.20 % of variance in pupil size. Data and regression lines for each eyetracker for both horizontal and vertical points of gaze are illustrated in Fig. 6.

Table 1 Results from the regression analysis of pupil diameter as a function of gaze
Fig. 6
figure 6

Pupil diameter change from baseline as a function of horizontal (left column) and vertical (right column) point of gaze for Tobii T120, Tobii X120, and EyeLink 1000 eyetrackers (from top to bottom rows, respectively)

Estimates of pupil diameter can be corrected using the coefficients derived from the regression equation with

$$ {P_c}=P-{b_o}-{b_1}X - {b_2}Y $$
(2)

where P c is the corrected pupil diameter, P is the estimated diameter, X and Y are the gaze coordinates along the horizontal and vertical axes, and b 0, b 1, and b 2 are the regression coefficients. Using Eq. 2, we were able to remove position artifacts from the data, whereby they showed homoscedasticity along the different values of Y or X, with a slope of 0. The remaining variation, e, is intra- and interindividual variation.

Discussion

This study examined the reliability of eyetrackers for pupil diameter estimation. Three systems were tested with a simple object-tracking task. Results show that pupil size can be significantly over- and underestimated depending on gaze position.

Eyetrackers estimate pupil diameter from a 2-D image. When eyes gaze straight into the camera, the pupil would appear relatively circular. But the more eyes look away from the camera (horizontally and/or vertically), the more the pupil appears squashed (in width and/or height, respectively) on the 2-D image. While this is inevitable given the technology used, eyetracker algorithms also compute gaze position and would be expected to correct for such distortions. Our results suggest that for these systems, at minimum, there is not enough correction applied.

Data from Tobii eyetrackers, as illustrated in Fig. 4, showed shifts of the gaze position over iterations of the tracked object, whereas data from the EyeLink system appear to be more accurate. This may be linked to the use of a chin forehead rest tower with the EyeLink setup and warrants testing such a setup with Tobii trackers for comparison.

EyeLink results displayed a temporal drift. The pupil diameter seems to increase linearly with time over the course of recording. This could be a pupillary light reflex artifact due to small changes in procedure. Unlike Tobii participants, EyeLink participants were already in the experimental box when the experiment was explained and consent form signed. Light was then lowered just before starting the experiment. The pupils may have needed more time to adapt, relative to testing conditions for Tobii systems.

The aim of this study was to assess whether different eyetrackers would introduce errors as a function of gaze in pupil measurement. It would have been interesting, especially given our findings, to compare the different systems with each other. However, to do so properly, the systems should be assessed in the same conditions with the same participants, which was not feasible in this study. It remains that each system created errors, and these should be corrected, whether the error is more or less substantial than that found in another system.

These results have practical implications. Experimental designs should counterbalance the position of their stimuli if it is relevant. Although this would be good methodological practice in the first place, it is especially acute if recording equipment shows spatial artifacts. But some research will face inevitable problems from such systematic measurement errors. For example, tasks that measure pupil diameter while participants read face a particular problem. Our results suggest that participants tested on a Tobii system, with a language such as English that follows a left-to-right, top-to-bottom sequence, would be most “aroused” when they begin to read a page of text and least “aroused” when they reach the end, even if the text has no arousing effect. However, the same participants would be least “aroused” at the start and most “aroused” at the end if they read the same text but were recorded by the EyeLink system.

These estimation errors should be taken into account and corrected in further studies. Regression equations such as those presented above should help researchers to correct their data before running statistical analyses. A priori, and unless manufacturers provide evidence that their systems compensate for point of gaze in pupil size estimation, labs that acquire or use eyetrackers should run a simple task such as the one presented in this article to assess the reliability of their equipment and, as appropriate, derive correction coefficients. Where appropriate, this correction can be computed for individual participants by running the tracking task prior to or after the other experiments being carried out. In some cases (e.g., infancy research), it may not be possible to run a tracking task to derive regression coefficients (because participants provide a minute data collection opportunity). We suggest that in such cases, researchers use a reference sample of adults to derive general correction coefficients instead.