A Novel Method for Estimating Free Space 3D Point-of-Regard Using Pupillary Reflex and Line-of-Sight Convergence Points

In this paper, a novel 3D gaze estimation method for a wearable gaze tracking device is proposed. This method is based on the pupillary accommodation reflex of human vision. Firstly, a 3D gaze measurement model is built. By uniting the line-of-sight convergence point and the size of the pupil, this model can be used to measure the 3D Point-of-Regard in free space. Secondly, a gaze tracking device is described. By using four cameras and semi-transparent mirrors, the gaze tracking device can accurately extract the spatial coordinates of the pupil and eye corner of the human eye from images. Thirdly, a simple calibration process of the measuring system is proposed. This method can be sketched as follows: (1) each eye is imaged by a pair of binocular stereo cameras, and the setting of semi-transparent mirrors can support a better field of view; (2) the spatial coordinates of the pupil center and the inner corner of the eye in the images of the stereo cameras are extracted, and the pupil size is calculated with the features of the gaze estimation method; (3) the pupil size and the line-of-sight convergence point when watching the calibration target at different distances are computed, and the parameters of the gaze estimation model are determined. Fourthly, an algorithm for searching the line-of-sight convergence point is proposed, and the 3D Point-of-Regard is estimated by using the obtained line-of-sight measurement model. Three groups of experiments were conducted to prove the effectiveness of the proposed method. This approach enables people to obtain the spatial coordinates of the Point-of-Regard in free space, which has great potential in the application of wearable devices.


Introduction
Human vision is the most important sense of human beings. Our eye movement also contains a great deal of visual attention and emotional information. Nowadays, many eye tracking methods have been developed to analyze the behavior of human eyes. These methods enable computers to understand human visual attention. Based on the gaze information, the human-computer interaction function can then be realized.
Existing state-of-the-art methods can be effective in a specific application environment, but it is still a relatively difficult problem to extract the depth perception of the observer from human eye movement information. In other words, it is still a challenge to estimate the three-dimensional coordinates of human's gaze points in free space by using only eye features. Therefore, a novel method for estimating convolution and boundary ellipse fitting to quickly locate the iris center, which ensured low-resolution gaze tracking with sufficient accuracy. Additionally, the method described in [22] also extracted pupil and iris features and used the Maximum a Posteriori (MAP) framework to ensure tracking accuracy. This work could also ensure high accuracy in low-resolution images.
Next, we give a brief discussion on the different structures of the gaze tracking system and their respective advantages and disadvantages. The remote camera-based methods usually consider head pose estimation and gaze estimation at the same time, which can be carried out independently, and recent methods can achieve good accuracy. While the head-mounted camera-based method can obtain clear eye images, which needs to rely on extra image sensors or pose measurement tools to estimate the head's motion. In addition, these approaches can be freely installed. Head-mounted gaze trackers are usually equipped with at least one eye camera to track eye movements [23], and near-infrared light source. The near-infrared light source can not only provide reflection spots as the main feature, but also improve the contrast of eye images. New devices proposed in recent years usually use multiple cameras [24], including at least one scene camera. Near-eye formed gaze tracker also has advantages in capturing clear eye images [25].
The methods using RGB-D cameras are able to effectively reconstruct facial features to achieve gaze estimation [26][27][28]. Currently, consumer-grade RGB-D cameras can reconstruct head posture changes well and ensure the resolution of the eye region images is sufficient at the same time.
The method based on a depth camera still has potential, under the condition that the object to be measured is completely unconstrained. However, in order to achieve low cost and easy deployment, the gaze tracking device based on visible light camera is still the most ideal implementation.
Through the investigation of gaze tracking techniques, we find that these existing techniques can extract the direction of line-of-sight very accurately. However, because most of them belong to the construction of a relationship model between the human eye information and the observed object in a fixed position, it is difficult for most of the existing methods to obtain accurate 3D gaze information. Therefore, recent 3D gaze tracking focuses on how to obtain the depth information of the PoR. Takemura et al. [23] combined the observed object image and the pupil vector, estimated the 3D coordinates of the PoR by Delaunay triangle approximation and estimated the pose of the head using the scene camera and attitude sensor. The method attempted to use the corner feature points extracted by the scene camera to match the intersection point of the line-of-sight and the scene and estimated the coordinates of the Point-of-Regard. However, this method relied on the feature points extracted from the images captured by the scene camera.
Li, Zhang and Webb [29] proposed a method that could allow the gaze point to control the movement of a robot arm to a specified position and proposed a method to calculate the spatial line-of-sight intersection point. This method has higher positioning accuracy in a smaller spatial range and does not limit the movement of the head. Fusing pupil features and the multi-camera view geometry is also a method to calculate the coordinate of the Point-of-Regard [30]. The calibration process of this method is simple and effective. Its advantage is that it does not need to assume a specific complex model of the eyes. Ferhat and Vilariño [31] conducted a review of the existing state-of-the-art visible light camera-based gaze tracking approaches. They concluded that focusing on the use of new features might be the focus of work to improve the performance of the gaze tracker. The studies analyzed above all attempts to solve the spatial coordinates of the PoR, but there are still problems such as the need for specific scenes or markers.

Methods
In order to achieve the 3D PoR estimation when a user is naturally looking at the object, two major works are included in the paper: (1) a customized gaze estimation system and the approach of the eye features extraction and the spatial localization, this part of work would be demonstrated in Sections 2, 3.1 and 3.2) system calibration process and 3D PoR estimation, this part of work would The line-of-sight estimation uses a simplified geometry-model to solve the parameters of the spatial line where the line-of-sight are located. Besides the basic model of the gaze estimation, we mainly describe a system calibration process in Section 3.3 to obtain the parameters of the line-ofsight geometry-model. The calibration process mainly aims to the search of the line-of-sight convergence point and relate the corresponding pupil size. After obtaining the fitting equation of the line-of-sight convergence point, the 3D PoR estimation result can be obtained by calculating the nearest point of the two spatial lines of the binocular line-of-sight.

Customized Gaze Estimation System
According to previous research results, gaze points can be obtained directly from the eye image. The gaze tracking system based on visible light cameras needs to use cameras and light sources with different structures in order to satisfy different methods. Usually, the head-mounted gaze tracking system needs to meet three conditions in its design: 1. Ensure that users have a large field of view; 2. The cameras avoid or minimize the influence of environmental illumination; 3. Head-mounted devices should be as light as possible.
Therefore, the gaze tracking device proposed in this paper mainly considers these three conditions when designing and proposes a more compact line-of-sight tracking device based on lowcost components.
The structure of the gaze estimation system in this paper is based on multiple view geometry. This gaze estimation method in 3D space provides users with a wide field of view. Since the mirror is placed at 45 degrees, the cameras can capture images from the position where is equivalent to the directly front of the eyes. Therefore, the system is more ideal than some common gaze tracking systems (e.g., Tobii Glasses and Dikablis Glasses) in extracting eye features. Since the method of gaze estimation relies on clear pupil images, we also set a near-infrared light source to obtain dark pupil images. The light source is made into a triangular light-emitting panel, and then two pieces of such light-emitting panels are placed between the two cameras. The illustration of the gaze tracker of the left eye is shown in Figure 2. This design enables the light source to evenly illuminate the human eye area, while we reduce the brightness of the light source as far as possible so as to reduce the stimulation of the light source to the eyes. The gaze estimation model is formed by four cameras, and these four cameras are divided into two groups; each group of cameras focuses on one of the eyes. We designed a special semitransparent mirror. Its reflective surface for the eye has high near-infrared band reflectivity, while the visible light band transmittance is higher than 70%. This enhances the illumination effect of the near-infrared light source while reducing the interference of ambient light with the camera imaging. This semi-transparent mirror is installed in front of the eye cameras. This setup can effectively reduce the impact of ambient light noise on the process of gaze estimation. The illustration of the gaze estimation system with coordinate definition is shown in Figure 3. The customized system uses two pairs of cameras to obtain the spatial coordinates of the features of the left and right eyes. The accurate extraction and spatial localization of the features of the eyes can make the center of the pupil and the inner eye corner as the basis for solving the spatial coordinate of 3D PoR.
The line-of-sight estimation uses a simplified geometry-model to solve the parameters of the spatial line where the line-of-sight are located. Besides the basic model of the gaze estimation, we mainly describe a system calibration process in Section 3.3 to obtain the parameters of the line-of-sight geometry-model. The calibration process mainly aims to the search of the line-of-sight convergence point and relate the corresponding pupil size. After obtaining the fitting equation of the line-of-sight convergence point, the 3D PoR estimation result can be obtained by calculating the nearest point of the two spatial lines of the binocular line-of-sight.

Customized Gaze Estimation System
According to previous research results, gaze points can be obtained directly from the eye image. The gaze tracking system based on visible light cameras needs to use cameras and light sources with different structures in order to satisfy different methods. Usually, the head-mounted gaze tracking system needs to meet three conditions in its design:

1.
Ensure that users have a large field of view; 2.
The cameras avoid or minimize the influence of environmental illumination; 3.
Head-mounted devices should be as light as possible.
Therefore, the gaze tracking device proposed in this paper mainly considers these three conditions when designing and proposes a more compact line-of-sight tracking device based on low-cost components.
The structure of the gaze estimation system in this paper is based on multiple view geometry. This gaze estimation method in 3D space provides users with a wide field of view. Since the mirror is placed at 45 degrees, the cameras can capture images from the position where is equivalent to the directly front of the eyes. Therefore, the system is more ideal than some common gaze tracking systems (e.g., Tobii Glasses and Dikablis Glasses) in extracting eye features. Since the method of gaze estimation relies on clear pupil images, we also set a near-infrared light source to obtain dark pupil images. The light source is made into a triangular light-emitting panel, and then two pieces of such light-emitting panels are placed between the two cameras. The illustration of the gaze tracker of the left eye is shown in Figure 2. This design enables the light source to evenly illuminate the human eye area, while we reduce the brightness of the light source as far as possible so as to reduce the stimulation of the light source to the eyes.  We fix the relative position of the cameras in the system with the calibration target. After the installation of the gaze estimator, the stereo cameras would be calibrated by a common camera calibration method, but what is special is that the calibration is based on the four cameras' virtual images. In this paper, we call them virtual cameras. In fact, this does not change the property of the imaging process, but it is important because it determines the definition of the world coordinate system. The world coordinate system coincides with the camera coordinate system of the leftmost camera as shown in Figure 2. The advantage is that it is not necessary to calibrate the relationship between the real camera and the reflector in a difficult way, and the world coordinate system based on the virtual camera can make the obtained data more intuitive.
This device is divided into left and right modules, and the size of a single module is 50 mm × 35 mm × 40 mm, and the distance between the two modules can be adjusted according to the pupil distance of different persons. Cameras are placed vertically, and the baseline of a pair of cameras is 22 mm. The focal length of a single camera is 3.6 mm, and the resolution of single camera is 640 × 580 pixels. The whole gaze tracking system proposed in this paper is shown in Figure 4. The gaze estimation model is formed by four cameras, and these four cameras are divided into two groups; each group of cameras focuses on one of the eyes. We designed a special semi-transparent mirror. Its reflective surface for the eye has high near-infrared band reflectivity, while the visible light band transmittance is higher than 70%. This enhances the illumination effect of the near-infrared light source while reducing the interference of ambient light with the camera imaging. This semi-transparent mirror is installed in front of the eye cameras. This setup can effectively reduce the impact of ambient light noise on the process of gaze estimation. The illustration of the gaze estimation system with coordinate definition is shown in Figure 3.  We fix the relative position of the cameras in the system with the calibration target. After the installation of the gaze estimator, the stereo cameras would be calibrated by a common camera calibration method, but what is special is that the calibration is based on the four cameras' virtual images. In this paper, we call them virtual cameras. In fact, this does not change the property of the imaging process, but it is important because it determines the definition of the world coordinate system. The world coordinate system coincides with the camera coordinate system of the leftmost camera as shown in Figure 2. The advantage is that it is not necessary to calibrate the relationship between the real camera and the reflector in a difficult way, and the world coordinate system based on the virtual camera can make the obtained data more intuitive.
This device is divided into left and right modules, and the size of a single module is 50 mm × 35 mm × 40 mm, and the distance between the two modules can be adjusted according to the pupil We fix the relative position of the cameras in the system with the calibration target. After the installation of the gaze estimator, the stereo cameras would be calibrated by a common camera calibration method, but what is special is that the calibration is based on the four cameras' virtual images. In this paper, we call them virtual cameras. In fact, this does not change the property of the imaging process, but it is important because it determines the definition of the world coordinate system. The world coordinate system coincides with the camera coordinate system of the leftmost camera as shown in Figure 2. The advantage is that it is not necessary to calibrate the relationship between the real camera and the reflector in a difficult way, and the world coordinate system based on the virtual camera can make the obtained data more intuitive.
This device is divided into left and right modules, and the size of a single module is 50 mm × 35 mm × 40 mm, and the distance between the two modules can be adjusted according to the pupil distance of different persons. Cameras are placed vertically, and the baseline of a pair of cameras is 22 mm. The focal length of a single camera is 3.6 mm, and the resolution of single camera is 640 × 580 pixels. The whole gaze tracking system proposed in this paper is shown in Figure 4.

Eye Features for Gaze Estimation
In this paper, we choose the pupil and the inner eye corner to estimate the gaze direction. These two features are stable and obvious in the image. On the other hand, assuming that the center of the pupil can represent the center of the field of view, and there is a line-of-sight convergence point in the eyeball, if we can determine the spatial position of the center of the pupil and the line-of-sight convergence point, we can easily calculate the spatial line-of-sight.
According to [32], under fixed light intensity, the eye pupil changes from large to small with the observed object from far to near. Therefore, this paper assumes that the illumination condition of the scene is stable, which makes the change of pupil size reflect the distance of the object observed by the human eye. According to [33], the depth perception of human eyes can affect the three-dimensional scene vision in the brain. How to establish a method of line-of-sight measurement to obtain the depth perception of eyes is the main work of this paper. This enables the method to adapt to most of the use environments, while the only thing that needs to be limited is the fixed environmental illumination condition. The results of feature extraction and spatial coordinates solving are shown in Figure 5. In applications, it is not possible to observe a luminous object, such as a screen. This is due to the natural contraction of the pupil when the eyes observe luminescent objects, so that the relationship between pupil size and the distance from the observation distance is invalid. However, this method can be applied when the position of the head and screen is fixed.

Eye Features for Gaze Estimation
In this paper, we choose the pupil and the inner eye corner to estimate the gaze direction. These two features are stable and obvious in the image. On the other hand, assuming that the center of the pupil can represent the center of the field of view, and there is a line-of-sight convergence point in the eyeball, if we can determine the spatial position of the center of the pupil and the line-of-sight convergence point, we can easily calculate the spatial line-of-sight.
According to [32], under fixed light intensity, the eye pupil changes from large to small with the observed object from far to near. Therefore, this paper assumes that the illumination condition of the scene is stable, which makes the change of pupil size reflect the distance of the object observed by the human eye. According to [33], the depth perception of human eyes can affect the three-dimensional scene vision in the brain. How to establish a method of line-of-sight measurement to obtain the depth perception of eyes is the main work of this paper. This enables the method to adapt to most of the use environments, while the only thing that needs to be limited is the fixed environmental illumination condition. The results of feature extraction and spatial coordinates solving are shown in Figure 5. In applications, it is not possible to observe a luminous object, such as a screen. This is due to the natural contraction of the pupil when the eyes observe luminescent objects, so that the relationship between pupil size and the distance from the observation distance is invalid. However, this method can be applied when the position of the head and screen is fixed.
According to [34], nearly all of the state-of-the-art pupil detection and pupil edge extraction approaches were also based on edge detection and ellipse fitting, because of their efficiency and accuracy. An approach for accurately extracting the pupil edge is proposed in this paper. This method uses the adaptive gray gradient edge extraction to extract the pupil edge, and then uses these points by using the RANSAC-based ellipse fitting to obtain the pupil edge in the image. This method considers how to deal with the problem of bright spot and eyelid occlusion when designing an eye-tracking device. First, a horizontal and vertical projective operation is used to locate the human eye area from the image. Usually, we can obtain clear pupil images, but when eyeball is fast moving or occluded by eyelid, the images are not good enough. At this time, image segmentation using the adaptive threshold segmentation method, such as the Otsu method, is not effective. Therefore, the reason why we analyze the blurred image is that the eye is turning far away from the light source, or when the current image frames are taken, the eyeball is moving rapidly.
observed object from far to near. Therefore, this paper assumes that the illumination condition of the scene is stable, which makes the change of pupil size reflect the distance of the object observed by the human eye. According to [33], the depth perception of human eyes can affect the three-dimensional scene vision in the brain. How to establish a method of line-of-sight measurement to obtain the depth perception of eyes is the main work of this paper. This enables the method to adapt to most of the use environments, while the only thing that needs to be limited is the fixed environmental illumination condition. The results of feature extraction and spatial coordinates solving are shown in Figure 5. In applications, it is not possible to observe a luminous object, such as a screen. This is due to the natural contraction of the pupil when the eyes observe luminescent objects, so that the relationship between pupil size and the distance from the observation distance is invalid. However, this method can be applied when the position of the head and screen is fixed. According to [34], nearly all of the state-of-the-art pupil detection and pupil edge extraction approaches were also based on edge detection and ellipse fitting, because of their efficiency and accuracy. An approach for accurately extracting the pupil edge is proposed in this paper. This method For an image I(x, y), the size is m × n, we firstly calculate the image gradient amplitude map, Then, the gradient amplitude near the center and the four corners of the image is set to 0, and the gradient amplitude map is divided into 3 × 3 blocks. Then the gradient value which is above the average of the gradient values in each block is extracted. After completing the steps above, traversing points with a gradient value greater than 0, and the gradient value near the bright spot is first set to be 0, obtaining G(x, y). The resulting map GG(x, y) is calculated according to the function: After obtaining the pupil edge point map GG(x, y), we use two-step ellipse fitting to obtain the pupil ellipse. In this process, we firstly use RANSAC-based ellipse fitting, and then compare the distances between fitting points and the nearest points on the fitting ellipse. The fitting parameter equation is: Among them, (x, y) is the image coordinate of the fitting points, (X 0 , Y 0 ) is the image coordinate of the center of the ellipse. The α is the rotation angle of the major axis relative to the transverse axis of the image, θ is the rotation angle of the fitting point on the ellipse, and A and B are the major axis and the minor axis of the ellipse, respectively.
The point is set to 0 when the distance between the fitting point and the closest point on the ellipse is greater than the 1/8 of the length of major axis. After steps above, the final pupil ellipse is obtained by the least square method-based ellipse fitting, and the fitting parameter equation of this step is the same as Equation (2). As shown in Figure 6, the method can obtain an accurate pupil edge.  In this paper, we adopt an inner eye corner extraction method based on multi-scale Harris corner detection [35]. As mentioned above, the area of the eye is obtained by a horizontal and vertical projective operation, and then the inner eye corner region can be easily located by threshold segmentation. The multi-scale Harris corner detection and the weight distribution of different scales are used to find the exact inner eye corner point. After obtaining the position of the inner eye corner of each image, the spatial coordinates of the corner points of the left and right eyes can be solved by stereo matching.
In a random experiment, we collected 80 groups of images to test the stability of the inner eye corner feature. The distance between the left and right inner corners is the benchmark and length changes of these distances are compared. The experimental result is shown in Figure 8, and the standard deviation is less than 0.10 mm. Through this experiment, we can obtain the stable inner eye corners by our method, and it can be proved that the inner eye corner is available as the reference feature for the estimation of the Point-of-Regard. The spatial position and size of the pupil are matched by the epipolar constraint, and the obtained matching points are used to obtain the spatial circle of the pupil by using the least square fitting 3D circle. The left and right images after epipolar rectification are shown in Figure 7a,b and the solving result of the spatial coordinates of the pupil edge is shown in Figure 7c.  In this paper, we adopt an inner eye corner extraction method based on multi-scale Harris corner detection [35]. As mentioned above, the area of the eye is obtained by a horizontal and vertical projective operation, and then the inner eye corner region can be easily located by threshold segmentation. The multi-scale Harris corner detection and the weight distribution of different scales are used to find the exact inner eye corner point. After obtaining the position of the inner eye corner of each image, the spatial coordinates of the corner points of the left and right eyes can be solved by stereo matching.
In a random experiment, we collected 80 groups of images to test the stability of the inner eye corner feature. The distance between the left and right inner corners is the benchmark and length changes of these distances are compared. The experimental result is shown in Figure 8, and the standard deviation is less than 0.10 mm. Through this experiment, we can obtain the stable inner eye corners by our method, and it can be proved that the inner eye corner is available as the reference feature for the estimation of the Point-of-Regard. In this paper, we adopt an inner eye corner extraction method based on multi-scale Harris corner detection [35]. As mentioned above, the area of the eye is obtained by a horizontal and vertical projective operation, and then the inner eye corner region can be easily located by threshold segmentation. The multi-scale Harris corner detection and the weight distribution of different scales are used to find the exact inner eye corner point. After obtaining the position of the inner eye corner of each image, the spatial coordinates of the corner points of the left and right eyes can be solved by stereo matching.
In a random experiment, we collected 80 groups of images to test the stability of the inner eye corner feature. The distance between the left and right inner corners is the benchmark and length changes of these distances are compared. The experimental result is shown in Figure 8, and the standard deviation is less than 0.10 mm. Through this experiment, we can obtain the stable inner eye corners by our method, and it can be proved that the inner eye corner is available as the reference feature for the estimation of the Point-of-Regard. Sensors 2018, 18, x FOR PEER REVIEW 10 of 22

Estimation of the 3D Point-of-Regard and the System Calibration
The location of the line-of-sight convergence point is determined by the location of the pupil center and the shape of the calibration object. This section proposes a 3D Point-of-Regard estimation model and the corresponding calibration process. Using a calibration target with free placement, the spatial coordinates of the line-of-sight convergence points can be estimated.
According to Section 3.2, the pupil size and the center of the pupil have been calculated by stereo rectification. According to the 3D Point-of-Regard estimation model, we present a calibration process to find a pair of line-of-sight convergence points of binocular eyes, and the spatial coordinates of the eye features extracted during calibrating are the key features when searching the line-of-sight convergence points. The calibration process is performed by the person who would observe a specific target at different distances several times. After the calibration process, we also need to integrate the spatial coordinates of the line-of-sight convergence points to the same coordinate system. In this paper, a spatial coordinate alignment approach based on the inner eye corner vector is proposed. Finally, a polynomial fitting equation is used to determine the relation equation between the line-ofsight convergence points and the pupil size.
In Section 3.3.1, we will describe the entire gaze estimation model in detail and analyze the search method when calibrating the line-of-sight convergence point. In Section 3.3.2, the spatial coordinate alignment approach based on the inner eye corner vector is demonstrated and also the coordinate transformation process. The result of one calibration process is shown in Figure 9.

Estimation of the 3D Point-of-Regard and the System Calibration
The location of the line-of-sight convergence point is determined by the location of the pupil center and the shape of the calibration object. This section proposes a 3D Point-of-Regard estimation model and the corresponding calibration process. Using a calibration target with free placement, the spatial coordinates of the line-of-sight convergence points can be estimated.
According to Section 3.2, the pupil size and the center of the pupil have been calculated by stereo rectification. According to the 3D Point-of-Regard estimation model, we present a calibration process to find a pair of line-of-sight convergence points of binocular eyes, and the spatial coordinates of the eye features extracted during calibrating are the key features when searching the line-of-sight convergence points. The calibration process is performed by the person who would observe a specific target at different distances several times. After the calibration process, we also need to integrate the spatial coordinates of the line-of-sight convergence points to the same coordinate system. In this paper, a spatial coordinate alignment approach based on the inner eye corner vector is proposed. Finally, a polynomial fitting equation is used to determine the relation equation between the line-of-sight convergence points and the pupil size.
In Section 3.3.1, we will describe the entire gaze estimation model in detail and analyze the search method when calibrating the line-of-sight convergence point. In Section 3.3.2, the spatial coordinate alignment approach based on the inner eye corner vector is demonstrated and also the coordinate transformation process. The result of one calibration process is shown in Figure 9.

Estimation of the 3D Point-of-Regard and the System Calibration
The location of the line-of-sight convergence point is determined by the location of the pupil center and the shape of the calibration object. This section proposes a 3D Point-of-Regard estimation model and the corresponding calibration process. Using a calibration target with free placement, the spatial coordinates of the line-of-sight convergence points can be estimated.
According to Section 3.2, the pupil size and the center of the pupil have been calculated by stereo rectification. According to the 3D Point-of-Regard estimation model, we present a calibration process to find a pair of line-of-sight convergence points of binocular eyes, and the spatial coordinates of the eye features extracted during calibrating are the key features when searching the line-of-sight convergence points. The calibration process is performed by the person who would observe a specific target at different distances several times. After the calibration process, we also need to integrate the spatial coordinates of the line-of-sight convergence points to the same coordinate system. In this paper, a spatial coordinate alignment approach based on the inner eye corner vector is proposed. Finally, a polynomial fitting equation is used to determine the relation equation between the line-ofsight convergence points and the pupil size.
In Section 3.3.1, we will describe the entire gaze estimation model in detail and analyze the search method when calibrating the line-of-sight convergence point. In Section 3.3.2, the spatial coordinate alignment approach based on the inner eye corner vector is demonstrated and also the coordinate transformation process. The result of one calibration process is shown in Figure 9.

Calibration Process
The design of the calibration board is given at the beginning. We customize a gray colored calibration board. There are four white crosses in the four corners, and the distances between each two crosses are 150 mm. The person has to look at the four crosses of the calibration board. In the whole calibration process, this calibration board moves five times, so the camera system should capture the eye images 20 times. The calibration board should be placed in any five different distances ranging from 40 cm to 120 cm, and the calibration board needs to be basically straightened to the gaze tracking system. The experiment needs to keep the relative position of the head and the system fixed, and the four crosses of the calibration board need to fully attract the visual attention of the user.
The studies proposed by [36,37] showed that the comfortable horizontal view range is about 35 degrees, while that of the vertical view range is about 20 degrees. Thus, the work demonstrated in this article limits the view range to the comfortable observation of the eyes. Due to the particularity of the human visual attention mechanism and the error of the line of sight measurement technique, the spatial PoR of the two eyes cannot be obtained directly. In this paper, we suppose that the nearest point of the two lines-of-sight can be used to approximate the spatial PoR. Therefore, this method uses the midpoint of the line which connects the pair of nearest points of the spatial binocular line-of-sight. The spatial contribution of each point is shown in Figure 10. The design of the calibration board is given at the beginning. We customize a gray colored calibration board. There are four white crosses in the four corners, and the distances between each two crosses are 150 mm. The person has to look at the four crosses of the calibration board. In the whole calibration process, this calibration board moves five times, so the camera system should capture the eye images 20 times. The calibration board should be placed in any five different distances ranging from 40 cm to 120 cm, and the calibration board needs to be basically straightened to the gaze tracking system. The experiment needs to keep the relative position of the head and the system fixed, and the four crosses of the calibration board need to fully attract the visual attention of the user.
The studies proposed by [36,37] showed that the comfortable horizontal view range is about 35 degrees, while that of the vertical view range is about 20 degrees. Thus, the work demonstrated in this article limits the view range to the comfortable observation of the eyes. Due to the particularity of the human visual attention mechanism and the error of the line of sight measurement technique, the spatial PoR of the two eyes cannot be obtained directly. In this paper, we suppose that the nearest point of the two lines-of-sight can be used to approximate the spatial PoR. Therefore, this method uses the midpoint of the line which connects the pair of nearest points of the spatial binocular lineof-sight. The spatial contribution of each point is shown in Figure 10.
The vectors of the two lines are: so that: Figure 10. The illustration of the line-of-sight convergence point method and the searching area. P li and P ri are the centers of the left and right eyes' pupils, P L and P R are the line-of-sight convergence points, and P 1i and P 2i are the nearest points of two lines which cover P L and P li , as well as P R and P ri . In our method, P L and P R would be searched in restricted areas, while P 1i and P 2i are known.
The central points of the pupils are known as P li and P ri , i = 1, 2, 3, 4. Assume that there is a pair of line-of-sight convergence points P L and P R in the eyeball, in order to calculate the nearest points P 1i and P 2i of the line through P li and P L and the line through P ri and P R . Let P L = (X L , Y L , Z L ) T , P R = (X R , YF R , Z R ) T , P li = (X li , Y li , Z li ) T , P ri = (X ri , Y ri , Z ri ) T , P 1i = (X 1i , Y 1i , Z 1i ) T , P 2i = (X 2i ,Y 2i ,Z 2i ) T . The vertical line between the two lines is subject to: The vectors of the two lines are: so that: According to the property of the nearest point of two spatial lines, two planes which are constituted by the normal vector and these two lines' vectors are constructed, and the normal vectors of these two planes are: According to the property of two feet on both the normal vector and two lines, we can describe the vectors from the original point to the nearest point as follows: Thus, the nearest points can be described as: The estimation of the PoR is described as: Before entering the calibration step, we set up a set of conditions to judge whether the values of P L and P R are the ideal calibration results. The shape of the calibration target is known, and the best values of P L and P R can be found according to whether the distance between the calculated PoR meets the distance between the calibration crosses. After finding suitable values of P L and P R , the distance between P 1i and P 2i is minimized to obtain the best P L and P R values. This can be described by: where L = 150 mm. The estimation of initial values of P L and P R and the set of the initial search area are basic problems. According to the research of the structure and imaging mechanism of the eyeball in [32], we set up the initial values of P L and P R at a position of 10 mm away from the center of the four pupil center points. The initial search area is a 4 mm × 4 mm × 4 mm cube centered on the initial coordinates of P L and P R . Namely: Among them, (x cl , y cl , z cl ) = Z li + 10 , and (x cr , y cr , z cr ) = According to the calculation Equation of P ei , it can be found that the function is not a convex function and the global optimal solution cannot be found by convex optimization. Therefore, this paper sets up several step lengths to search for the position of possible line-of-sight convergence points several times. A large step length is used first to search for the best P L and P R values at the current step. A small step length and a small search area is then used to find the new best P L and P R values. Cycle this process several times, a pair of more accurate line-of-sight convergence points can be determined. The step length is equivalent to the subdivision of the search area. For example, when the search range is a 4 mm × 4 mm × 4 mm cube and the search step length is 1 mm, the number of all search points is (4 + 1) 3 = 125. The initial step length is set to 0.2 mm. In this method, each search step length is half of the previous search step length, and the search area range is also half of the previous search range. When the search step length is less than 0.0005 mm, the search process ends.
In addition to calculating whether the distance between the PoR meets the positional relationship between the calibration crosses, a more accurate line-of-sight convergence point can be obtained by calculating the nearest distance between the lines of sight. Namely: However, we need to unify the two judgment bases, so this paper proposes a function to merge these two judgment bases: where n is the number of search points in the current search area and ε is the sensitivity coefficient, in this article ε = 10. The calibration result in one step is shown in Figure 11.

Coordinate Alignment and Line-of-Sight Convergence Point Fitting Method
After obtaining several pairs of line-of-sight convergence points, comparing with the pupil size, we can find this rule: when the pupil is larger, the line-of-sight convergence point is closer to the pupil; when the pupil is smaller, the line-of-sight convergence points is far away from the pupil. In addition, after obtaining multiple pairs of line-of-sight convergence points and corresponding pupil sizes, there is the possibility of correlating the spatial coordinates of the line-of-sight convergence points with the inner eye corners. However, before that, it is necessary to establish an eye coordinate system based on the spatial coordinates of the inner eye corners, so that the relationship between each feature point can be described at any time. In Figure 12, the calibrated line-of-sight convergence points which have been aligned in eye coordinate system are shown. We can see the distribution of these points.
Before analyzing the relationship between the spatial coordinates of the line-of-sight convergence point and the pupil size, we first determine the relationship between the line-of-sight convergence point and the inner eye corner. As a stable facial feature, the inner eye corner can be used as a reference for a facial pose. However, only one pair of points cannot restrict all degrees of freedom of the facial pose. Further, there is no restriction on head pitch. The device described in this paper restricts the pitch motion of the head when in use, greatly inhibiting the influence of the change To summarize the algorithm:

1.
First set an initial value of P L and P R , the initial search areas A init1 and A init2 , and the search step length st.

2.
Calculate the judgment value of each search point according to Equation (14) and take the best P L and P R values in the current search area as the center point of the next search.

3.
The current search area A cur1 , A cur2 and the search step length st are redefined, wherein the boundary width of A cur1 and A cur2 is half of that in the previous search, and the value of st is also half of that in the previous search.

4.
Repeat step (2) to obtain the best P L and P R values in the current search area and use them as the center of next search area for the next search. 5.
Repeat step (3). If st < 0.0005 mm, the calculation is finished, and current P L and P R values are the final calculation results. If not, repeat step (2).

Coordinate Alignment and Line-of-Sight Convergence Point Fitting Method
After obtaining several pairs of line-of-sight convergence points, comparing with the pupil size, we can find this rule: when the pupil is larger, the line-of-sight convergence point is closer to the pupil; when the pupil is smaller, the line-of-sight convergence points is far away from the pupil. In addition, after obtaining multiple pairs of line-of-sight convergence points and corresponding pupil sizes, there is the possibility of correlating the spatial coordinates of the line-of-sight convergence points with the inner eye corners. However, before that, it is necessary to establish an eye coordinate system based on the spatial coordinates of the inner eye corners, so that the relationship between each feature point can be described at any time. In Figure 12, the calibrated line-of-sight convergence points which have been aligned in eye coordinate system are shown. We can see the distribution of these points.
Before analyzing the relationship between the spatial coordinates of the line-of-sight convergence point and the pupil size, we first determine the relationship between the line-of-sight convergence point and the inner eye corner. As a stable facial feature, the inner eye corner can be used as a reference for a facial pose. However, only one pair of points cannot restrict all degrees of freedom of the facial pose. Further, there is no restriction on head pitch. The device described in this paper restricts the pitch motion of the head when in use, greatly inhibiting the influence of the change of the head pitching attitude. Thus, the line-of-sight convergence point can be used as a reference for the estimation of the PoR.  In this paper, the eye coordinate system is established with the right inner eye corner as the origin, as shown in Figure 13. And its coordinate axis direction and scale factor are consistent with the camera system coordinate system. At the same time, the left and right eye corners are connected to form the vector of the inner corner of the eye. This vector can be used as a reference to describe the coordinates of the line-of-sight convergence points so as to make use of the calibration results. In this paper, the direction and length of the vectors from the origin of eye coordinate system to PL and PR are used to describe the spatial coordinate of the line-of-sight convergence point.  In this paper, the eye coordinate system is established with the right inner eye corner as the origin, as shown in Figure 13. And its coordinate axis direction and scale factor are consistent with the camera system coordinate system. At the same time, the left and right eye corners are connected to form the vector of the inner corner of the eye. This vector can be used as a reference to describe the coordinates of the line-of-sight convergence points so as to make use of the calibration results. In this paper, the direction and length of the vectors from the origin of eye coordinate system to P L and P R are used to describe the spatial coordinate of the line-of-sight convergence point.  In this paper, the eye coordinate system is established with the right inner eye corner as the origin, as shown in Figure 13. And its coordinate axis direction and scale factor are consistent with the camera system coordinate system. At the same time, the left and right eye corners are connected to form the vector of the inner corner of the eye. This vector can be used as a reference to describe the coordinates of the line-of-sight convergence points so as to make use of the calibration results. In this paper, the direction and length of the vectors from the origin of eye coordinate system to PL and PR are used to describe the spatial coordinate of the line-of-sight convergence point.   In the eye coordinate system, v 0 is the inner eye corner vector as the initial reference for aligning the other inner eye corner vectors, and the right corner coordinate is the initial right corner. v r is the vector from the origin of the eye coordinate system to the right line-of-sight convergence point. v l is the vector from the origin of the eye coordinate system to the left eye line-of-sight convergence point. Thus, we can obtain: R l and R r are the rotation matrixes from v 0 to v l and v 0 to v r . Meanwhile, the mode lengths of v r and v l , v r , v l , are also calculated. When the inner eye corner vector v i is in a different position and direction compared to v 0 , we first translate v i from the camera coordinate system to the eye coordinate system defined by v 0 . Then, the right corner coordinate is aligned with the initial right corner coordinate by a translation vector T'. Then the rotation matrix R' between v 0 and v i can be calculated The line-of-sight convergence point vectors in the eye coordinate system in which v i is located is: Therefore, P L and P R can be calculated: After determining the spatial position relationship of the line-of-sight convergence point with respect to the inner eye corners, we use polynomial fitting based on the least squares method to determine the relationship between the line-of-sight convergence point and the pupil size. Through experiments, it is found that the polynomial fitting with the highest term being quadratic term can be effective. The obtained line-of-sight convergence point and inner eye corner are converted to the initial inner eye corner vector reference, and the distribution of the line-of-sight convergence points is observed and found to be basically along a straight line perpendicular to the inner eye corner vector. Therefore, we separate the X, Y, and Z coordinates of the line-of-sight convergence point, and firstly determine the fitting equation of Z and the pupil size S. Then the spatial line equation is used to fit the relationship between X and Z, Y and Z respectively. Because Z coordinate is sensitive to the pupil size S, we first establish a fitting equation between Z and S: While the Equation of the spatial straight line can be written as follows: Therefore, Equations of X and Y can be obtained: Finally, the fitting parameter Equation is: The calibration process finally obtains the relationship between the pupil size and the spatial coordinates of the line-of-sight convergence point within the initial inner eye corner vector reference. When in use, the spatial coordinates of the line-of-sight convergence point within the initial inner eye corner vector reference can be obtained by inputting the pupil size into the parametric equation, and the output spatial coordinates can be further converted to the new spatial coordinates within the actual inner corner vector reference.

Experiments
The pupil size changes with the change of human consciousness, so the coordinates of the calculated Point-of-Regard (PoR) have a large distribution. Therefore, the experiment in this paper uses the average of the coordinates of the PoR within a period of time to represent the measured PoR. In addition to counting the average error of the measurement results, this paper also compares the shape of the object being watched with the position of the PoR in the test. The difference between the estimated 3D PoR and the spatial coordinates of the gaze point in the real is used to evaluate the effectiveness of the proposed method.
The experiment was conducted in a fixed indoor illumination condition. Images of about 0.5 s were captured at each fixation. The experimental results are all in the system camera coordinate system.

Intuitive 3D Point-of-Regard Estimating Experiments
In this part, a person completes a calibration process in the indoor environment and then involves in two groups of experiments of watching different objects. Firstly, an actual object was used to test the result: square boxes with different shapes were placed at different positions so that the person could look at the six visible corners in the field of view to take the measurement results. The results are shown in Figure 14.
Finally, the fitting parameter Equation is: The calibration process finally obtains the relationship between the pupil size and the spatial coordinates of the line-of-sight convergence point within the initial inner eye corner vector reference. When in use, the spatial coordinates of the line-of-sight convergence point within the initial inner eye corner vector reference can be obtained by inputting the pupil size into the parametric equation, and the output spatial coordinates can be further converted to the new spatial coordinates within the actual inner corner vector reference.

Experiments
The pupil size changes with the change of human consciousness, so the coordinates of the calculated Point-of-Regard (PoR) have a large distribution. Therefore, the experiment in this paper uses the average of the coordinates of the PoR within a period of time to represent the measured PoR. In addition to counting the average error of the measurement results, this paper also compares the shape of the object being watched with the position of the PoR in the test. The difference between the estimated 3D PoR and the spatial coordinates of the gaze point in the real is used to evaluate the effectiveness of the proposed method.
The experiment was conducted in a fixed indoor illumination condition. Images of about 0.5 s were captured at each fixation. The experimental results are all in the system camera coordinate system.

Intuitive 3D Point-of-Regard Estimating Experiments
In this part, a person completes a calibration process in the indoor environment and then involves in two groups of experiments of watching different objects. Firstly, an actual object was used to test the result: square boxes with different shapes were placed at different positions so that the person could look at the six visible corners in the field of view to take the measurement results. The results are shown in Figure 14.  Secondly, the calibration board was then used to test the extraction of depth perception: the calibration board is placed at different positions 1 m away from the system, the test point was selected for a person to watch, and the obtained PoR with the position on the test point was compared.
This experiment can further prove the relationship between the depth perception of the human vision system and the pupil size under fixed illumination conditions. The results are shown in Figure 15.
on the box. The figure shows that this method can output a more accurate response to the stereoscopic perception of human vision.
Secondly, the calibration board was then used to test the extraction of depth perception: the calibration board is placed at different positions 1 m away from the system, the test point was selected for a person to watch, and the obtained PoR with the position on the test point was compared. This experiment can further prove the relationship between the depth perception of the human vision system and the pupil size under fixed illumination conditions. The results are shown in Figure 15.

Influences of Different People or in Different Illumination Condition
As a general method, the 3D PoR estimation model needs to meet the needs of measuring PoR of different people. Another item that needs to be improved in this method is the influence of pupillary light reflection on the calculated convergence point due to changes in environmental illumination. This Section describes these two verification experiments together. We selected the testing time at night, the indoor lighting includes several sets of daylight lamps and all of them could be controlled by switches. We invited nine testers, and each of them was allowed to complete the calibration process once under a fixed illumination condition, with a fixed calibration board was placed at a position 1 m away. Let the testers look at the four test points on the calibration board, then adjusted the condition of indoor illumination several times and compared the estimating 3D PoR.
The experiment can be divided into two parts: 1. The illumination condition of this experiment was the same as that of the calibration process. We invited nine testers to look at the calibration board respectively. Each one was allowed to participate in one group of experiments, and there were eight repeated experiments in each group. The interval between each experiment were 30 s. According to the distribution of the solved 3D PoR in each group, the average Cartesian error of every tester's 3D PoR was listed in Table 1. 2. We let a tester look at the calibration board under different lighting conditions and observe the distribution of 3D PoR. During the experiment, the indoor illumination changed from common to dark for four times. After adjusting to the dark, we used a flashlight to illuminate the

Influences of Different People or in Different Illumination Condition
As a general method, the 3D PoR estimation model needs to meet the needs of measuring PoR of different people. Another item that needs to be improved in this method is the influence of pupillary light reflection on the calculated convergence point due to changes in environmental illumination. This Section describes these two verification experiments together. We selected the testing time at night, the indoor lighting includes several sets of daylight lamps and all of them could be controlled by switches. We invited nine testers, and each of them was allowed to complete the calibration process once under a fixed illumination condition, with a fixed calibration board was placed at a position 1 m away. Let the testers look at the four test points on the calibration board, then adjusted the condition of indoor illumination several times and compared the estimating 3D PoR.
The experiment can be divided into two parts: 1.
The illumination condition of this experiment was the same as that of the calibration process. We invited nine testers to look at the calibration board respectively. Each one was allowed to participate in one group of experiments, and there were eight repeated experiments in each group. The interval between each experiment were 30 s. According to the distribution of the solved 3D PoR in each group, the average Cartesian error of every tester's 3D PoR was listed in Table 1.

2.
We let a tester look at the calibration board under different lighting conditions and observe the distribution of 3D PoR. During the experiment, the indoor illumination changed from common to dark for four times. After adjusting to the dark, we used a flashlight to illuminate the calibration board, and then took one more experiment to observe the experimental results. The intuitive experimental result is shown in Figure 16. calibration board, and then took one more experiment to observe the experimental results. The intuitive experimental result is shown in Figure 16.  Figure 16. The graph shows the experimental result when the tester was watching the calibration board in five different illumination conditions.
These two experiments show that the 3D PoR distribution errors of different testers are approximate and concentrated in the space near the actual observed points. In experiments under different lighting conditions, we can find that the 3D PoR test results are adjacent when the lighting conditions are close to the condition during the calibration. However, when the intensity of light decreases, the test results become unpredictable. This instability can reflect that the depth perception of human vision is inaccurate in dark places. When the daylight lamps are all turned off and within a flashlight illuminating, the result is the same as the shape of the calibration board, but the solved spatial distribution of 3D PoR is far away from the actual position of the calibration board. Similarly, it can be considered that the method in this paper is not applicable when looking at luminous objects such as monitors.

Error of the Method in Different Distances
Finally, we test the accuracy of the spatial gaze point coordinates under different distances. A necessary experimental environment is constructed. The test object is the calibration board used in the calibration process. We fix a calibrated camera as a test camera under the gaze tracking system so that the tester can first look at a closer calibration board. At the same time, the test camera is used to solve the spatial coordinates of the calibration board, and then the gaze tracking system and the test camera are associated. In this way, we can obtain the space coordinates of the calibration board under the camera coordinate system of the gaze tracking system.
In the experiment, a target was fixed to seven different distances. At each distance, the tester carried out 15 repeated experiments, and then we counted the average errors of the coordinates of the PoR. We also compared the 3D PoR estimation result in [30], and this experimental result is obtained with indoor lighting condition. The experimental results are shown in Table 2. These two experiments show that the 3D PoR distribution errors of different testers are approximate and concentrated in the space near the actual observed points. In experiments under different lighting conditions, we can find that the 3D PoR test results are adjacent when the lighting conditions are close to the condition during the calibration. However, when the intensity of light decreases, the test results become unpredictable. This instability can reflect that the depth perception of human vision is inaccurate in dark places. When the daylight lamps are all turned off and within a flashlight illuminating, the result is the same as the shape of the calibration board, but the solved spatial distribution of 3D PoR is far away from the actual position of the calibration board. Similarly, it can be considered that the method in this paper is not applicable when looking at luminous objects such as monitors.

Error of the Method in Different Distances
Finally, we test the accuracy of the spatial gaze point coordinates under different distances. A necessary experimental environment is constructed. The test object is the calibration board used in the calibration process. We fix a calibrated camera as a test camera under the gaze tracking system so that the tester can first look at a closer calibration board. At the same time, the test camera is used to solve the spatial coordinates of the calibration board, and then the gaze tracking system and the test camera are associated. In this way, we can obtain the space coordinates of the calibration board under the camera coordinate system of the gaze tracking system.
In the experiment, a target was fixed to seven different distances. At each distance, the tester carried out 15 repeated experiments, and then we counted the average errors of the coordinates of the PoR. We also compared the 3D PoR estimation result in [30], and this experimental result is obtained with indoor lighting condition. The experimental results are shown in Table 2. Table 2. Average errors in centimeters measured as the difference between the estimated PoR by the According to the experimental result shown in Table 2, the 3D PoR estimation of our method is not accurate as the compared method. The errors on the X and Y can be controlled to a low level, while the errors on the Z is much bigger. However, within a relatively simplified gaze estimation model and a fast calibration process, the actual performance of the proposed method is still in line with the expectation.

Discussion
We employed the test camera for capturing scene images to check the difference between the estimated 3D PoR and the ground truth. According to the experimental result, our method can accurately estimate the spatial coordinates of the object of interest in indoor environment. As we predicted, lighting condition changes influence the 3D PoR estimation. This is because the lighting condition during calibration process is different from that when applying, which is a defect of this method. However, this method can also show strong adaptability when used indoors. In addition, the accuracy of this method should be improved, but using a simplified calibration process may be difficult to further improve the accuracy. This is also the focus of future research.

Conclusions
We propose a novel approach to estimate 3D Point-of-Regard in free space, which can be used as a specific head-mounted multi-camera gaze tracking device. Based on the pupillary accommodation reflection property in the human visual system, this method can perceive the depth perception of human vision, and takes the fixation point at the center of the view range of both eyes as the auxiliary assumption in fixation duration, and further associates the line-of-sight convergence point with the pupil size to form a model for measuring the 3D PoR. In addition, this paper proposes a gaze-tracking device which combines four cameras and semi-transparent mirrors, and also uses a novel near-infrared lighting source to enhance the contrast of pupil images. The pupil center and the inner eye corner are the basic features of the line-of-sight measurement. Through a simple calibration process, the line-of-sight convergence point is searched and the corresponding pupil size is correlated. The parameters of the whole line-of-sight measurement model are very simple, and the center point of the nearest points of the spatial line-of-sight of the left and right eyes is represented as the measured PoR. The method proposed in this paper is examined through several experiments: it can be proved that the method can be used to measure the spatial coordinates of PoR, and that the gaze-tracking device proposed in this paper is a novel method for wearable devices. Experimental results show that the gaze point measurement model proposed in this paper has sufficient measurement accuracy, while the model and calibration process are relatively simple, and they can reflect the depth perception information of human vision in fixed lighting conditions.