Estimation of Gaze Detection Accuracy Using the Calibration Information-Based Fuzzy System

Gaze tracking is a camera-vision based technology for identifying the location where a user is looking. In general, a calibration process is applied at the initial stage of most gaze tracking systems. This process is necessary to calibrate for the differences in the eyeballs and cornea size of the user, as well as the angle kappa, and to find the relationship between the user’s eye and screen coordinates. It is applied on the basis of the information of the user’s pupil and corneal specular reflection obtained while the user is looking at several predetermined positions on a screen. In previous studies, user calibration was performed using various types of markers and marker display methods. However, studies on estimating the accuracy of gaze detection through the results obtained during the calibration process have yet to be carried out. Therefore, we propose the method for estimating the accuracy of a final gaze tracking system with a near-infrared (NIR) camera by using a fuzzy system based on the user calibration information. Here, the accuracy of the final gaze tracking system ensures the gaze detection accuracy during the testing stage of the gaze tracking system. Experiments were performed using a total of four types of markers and three types of marker display methods. From them, it was found that the proposed method correctly estimated the accuracy of the gaze tracking regardless of the various marker and marker display types applied.


Introduction
Studies have been actively carried out to accurately calculate a user's gaze location based on the movement of their face or eyes [1,2]. In addition, studies are being carried out to apply a user's gaze location as a more natural and convenient input device replacing a keyboard and mouse, which are commonly used input devices [3][4][5][6][7]. Because it is intuitive, a user's gaze location can be easily used as an input information without a separate training process. In particular, input devices such as a keyboard, mouse, or remote control, which is commonly used by the disabled who cannot use their hands freely, are difficult to apply properly. Also, even when they can be used, they cannot be conveniently manipulated or precisely controlled.
In general, gaze tracking is a technology for identifying the position on a screen that the user is looking at. In most gaze tracking systems, a user's gaze position is calculated based on the relative positions of the center of the user's pupil and the corneal specular reflection. Here, for an accurate gaze tracking system, it is very important to accurately find the pupil center and corneal specular reflection. Along with this, a user calibration process is necessary for the accurate gaze tracking of the user. In general, a calibration process is applied at the initial stage of most gaze tracking systems. This process is necessary to calibrate for the differences in the eyeballs and cornea size of user, as well as the ‚ The Euclidean distance is calculated between the user's calculated gaze and the reference positions obtained during the user calibration. Then, the mean and standard deviations of the Euclidean distances are extracted as the first and second features, respectively. In addition, the change in gaze position for each frame, which is obtained during the user calibration, is extracted as the third feature.
‚ Developing a fuzzy system that takes these three features as inputs and the accuracy of the final gaze tracking system as an output, the accuracy of the final gaze tracking system is estimated based on the results obtained during the user calibration.

‚
The validity of the proposed fuzzy-based estimation system is verified experimentally using various types of markers such as static and dynamic markers, and various types of marker displays such as sequential, random, and guiding displays.
The advantages and disadvantages of this study compared to conventional studies are shown in Table 1. Requiring two stereo cameras and five light sources [8] User convenience increases because of no calibration process Due to two cameras and multiple illuminators, the system complexity increases, as do the price and system size Calibration using static markers Calibration by selecting five or nine points [9] As the number of calibration points increases, the gaze tracking accuracy becomes more accurate

Decrease in user convenience
Nine calibration points [10][11][12][13][14] Gaze tracking accuracy is affected by the accurate calibration procedure One point calibration [15] Four point calibration [16,17] Calibration using dynamic markers and display Calibration by following a moving target with the eyes [18] Prevents an inaccurate calibration because the user's concentration is high compared to the use of static markers When the movement line of the calibration marker is so complex that it cannot be predicted by the user, the calibration accuracy decreases When the movements are complex, the user convenience decreases Using the estimation model of gaze tracking accuracy Gaze tracking accuracy can be estimated using fuzzy system based on the user calibration information from various calibration methods using static, dynamic markers and display The performance of gaze tracking system can be quantitatively predicted for various calibration markers and display methods The design of an additional fuzzy rule table and the membership function is necessary (proposed method) The remainder of this paper is organized as follows. In Section 2, the calibration-based estimation method of gaze tracking is described. In Section 3, the experimental results with various calibration methods and an analysis of these methods are provided. In Section 4, some concluding remarks and the direction for a follow-up study are given. Figure 1 shows a flowchart of the user calibration-information based gaze tracking estimation system proposed in this study. We refer to the previous research [19] for detecting the centers of pupils and corneal specular reflection.

Pupil Center Region Detection Process
The rough corneal specular reflection of the user is detected based on the bright pixel values in the image. Based on this reflection, the eye region of the user is detected, as shown in Figure 1 Figure 2g, the edge of the pupil shape was detected by applying canny edge detection to the image in Figure 2f. In addition, applying a convex hull method [20][21][22] to the image in Figure 2g, the image in Figure 2h shows the result of compensating the pupil part damaged from the reflection occurring on the pupil. The image in Figure 2i is a binarized version of the image in Figure 2c centering on the reflection. Figure 2j shows an image, in which the overlapped part of Figure 2i,h images is removed from the image of Figure 2h. In the image in Figure 2j, because the pupil boundary and the corneal specular reflection boundary are touching each other, a concave portion is detected during the pupil boundary detection. Then, a shape different from the real pupil boundary is detected. To resolve this, we remove the overlapped parts in Figure 2h,i from the image in Figure 2h. Then, the pupil boundary that has removed the touching part can be detected, as shown in the image in Figure 2j. Figure 2k shows the detection of the final pupil boundary by applying an ellipse-fitting algorithm using the image in Figure 2j. Figure 2l shows the final resulting image. Based on the detected position of the pupil center, the searching area for detecting corneal specular reflection is defined. Then, the center of corneal specular reflection is located based on binarization and calculating the geometric center within this area.

Calculating the User's Gaze Position
The gaze position of the user is calculated using the detected pupil center and reflection center coordinates. To compensate the differences in each person's eyeball and cornea size, as well as the angle kappa, and to find the relationship between the user's eye and screen coordinates, the calibration process is conducted before the gaze position is calculated. The gaze position is calculated using several geometric transform matrices obtained through the calibration process [19]. When nine reference points are gazed at during the calibration, by using each of the nine pupil center positions obtained as shown in Figure 3, the relationship between the screen and pupil center can be expressed as in Figure 4. As Figure 4 illustrates, when the user is looking at the nine calibration points, the pupil image plane can be mainly divided into four subregions by using the center position of the pupil. Each of the four pupil subregions can be mapped to four monitor subregions, which are divided into the nine calibration points of the monitor image plane. In addition, a geometric transform matrix corresponding to each subregion is obtained and used [19,23]. For example, pupil subregion 3 can be mapped into monitor subregion 3, and here, matrix 3 is used as shown in Figure 4. (e) application of morphology calculation to the image of (d); (f) application of component labeling to the image of (e); (g) application of canny edge detection to the image of (f); (h) application of a convex hull to the image of (g); (i) image that has binarized the reflection with the image of (c); (j) image that has removed the overlapped parts of (h,i) from the image of (h); (k) application of ellipse-fitting algorithm to the image of (j); and (l) the final detection result of the pupil center region. When calculating the user's gaze position, if the user's pupil center exists at pupil subregion 2 (P 2 P 3 P 6 P 5 ), the user's gaze position is calculated using matrix 2 among the four geometric transforms. Here, the gaze position is calculated through the processes in Equations (1) and (2) [19]. Figure 5 shows the example of correspondence between four pupil centers and four monitor corners when a user is gazing at the four corners of a monitor.
P y1 P y2 P y3 P x0 P yo P x1 P y1 P x2 P y2 P x3 P y3

Three Features for the Inputs of Fuzzy System, and Fuzzy Membership Function with Rule Table
In this study, a method for estimating a user's gaze tracking accuracy is proposed using a fuzzy system based on information obtained during the calibration process. As shown in Figure 6, the gaze tracking accuracy of the user is estimated based on the fuzzy algorithm by using three feature values of the user, which can be obtained during the calibration process. Among the eye features of the user that can be obtained during the calibration process, the following three feature values were utilized. In Figure 7, the Euclidean distances are calculated between the reference points that have to be looked at during the calibration, and the positions that the user is actually gazing at. Then, Feature 1 (F 1 ) indicates the mean value of the Euclidean distances.  With respect to the use of the fuzzy algorithm, the design of the membership function is a very important factor. Figure 8 shows the input/output membership function configuration of the feature values. As shown in Figure 8, a linear member function or triangular membership function is most commonly used because the system's calculation speed is fast and the complexity of the codes can be relatively reduced [24][25][26]. Features 1-3 described above each show different values, and the range of each feature value is different. Furthermore, depending on the user's calibration result, the ranges of Features 1-3 change irregularly. Therefore, each feature value was normalized with a value of zero to 1, as shown in Figure 8. The fuzzy membership functions can be mainly divided into two types. As shown in Figure 8a, an input membership function having low (L) and high (H) values was designed. In addition, as shown in Figure 8b, an output membership function having low (L), middle (M), and high (H) values was designed.
Furthermore, a fuzzy rule table (Table 2)    In general, an output value of the fuzzy system can be obtained through the output fuzzy membership function and the fuzzy rule. The input function in Figure 8a has two output values and one input value (F i ). In other words, because there are three input feature values, there are a total of eight output values.
For example, as can be seen in Figure 9,  In Figure 10a, several output values can be obtained using one inference value. For example, when the inference value is 0.71(M), the output values O 2 , O 3 , and O 4 can be obtained. From these several output values, using a defuzzification method, one final output value can be determined [28,29]. This output value is used as the accuracy estimation score of the gaze tracking system.   Figure 11 shows the experimental environment used in this study. For the experiment, a desktop computer containing a 3.5-GHz Intel ® core™ i7-3770K CPU and 8 GB of RAM was used. The monitor resolution was 1280ˆ1024. For the camera, a 700-nm long pass filter (Kodak Wratten Filter No. 89B, Rochester, NY, USA [30]) and a zoom lens were mounted on a webcam (Logitech C600, Lausanne, Switzerland [31]) having a universal serial bus (USB) interface. In addition, an 850-nm reference illuminator was used in an 8ˆ8 array. The camera was positioned right below the center of the monitor, and the illuminator was also positioned below the camera. The distance between the monitor and user was roughly 75 cm. Our program was implemented using a Microsoft foundation class based C++ program, DirectX 9.0 SDK, and OpenCV in a Microsoft Visual Studio C++ 2010 development environment. The calibration was carried out according to each marker type and display method described in Section 3.1. The gaze position accuracy of the user was measured for each calibration. The experiment was repeated four times for each test subject and a total of seven calibration methods. A total of 15 test subjects were used.

Various Calibration Marker Types and Display Methods
In this paper, various calibration marker types and display methods are proposed to check the validity of the user-calibration based gaze tracking estimation model. As shown in Figure 1, the calibration marker types are classified into two types: static marker and dynamic marker types.
Among the commonly used calibration markers, Figure 12a shows the static markers of a circle shape and a cross. Figure 12b shows a dynamic marker with a dynamically changing shape, i.e., the color is becoming darker gradually as the shape of the marker decreases from a large circle to a small circle in the actual system. In addition, using each marker, the calibration was performed with different display methods. The marker display methods include a commonly used sequential display method shown in Figure 13a, and a random display method shown in Figure 13b. In addition, they include the method of dynamically guiding to the next display position of the calibration marker, named as guiding display method, shown in Figure 14.
In Figure 13, the red dot is a calibration point that the user has to gaze at, and, in the next step, the calibration finishes with point represented by a white color.
In Figure 14, a dynamic marker is used, and the marker shape changes step by step, as shown in Figure 15. At first, the marker is expressed as shown in the leftmost image of Figure 15, and afterward, the marker changes as shown in the images on the right-hand side. As shown in the second image from the left in Figure 15, the marker is divided into four points and disappears. And, as shown in the third image from the left in Figure 15, at the moment of calibration it blinks as a white dot. Afterward, as shown in the rightmost image of Figure 15, it is changed to an orange dot, and while moving to the next calibration point, it is increased to the marker size shown in the leftmost image of Figure 15. As the marker shape changes in this manner, the calibration is performed at a total of nine positions, as shown in the sequence in Figure 14.

Calibration Results and Gaze Detection Accuracy
In this section, we show the user calibration results and gaze tracking accuracy. In Figures 16-18 the blue and green dots of (a,c) mark the gaze coordinates of both eyes during the user calibration. The red circles represent the regions that have to be looked at. As shown in (a,c) of Figures 16-18 when the calibration gaze coordinates are concentrated in regions that have to be looked at during the calibration, the error in the gaze tracking accuracy of the users is small, as shown in (b,d). Based on this, it can be seen that the accuracy levels during the user calibration and gaze tracking are related. At the same time, it is confirmed that the gaze tracking estimation model can be obtained using the proposed three features. The first and second features are respectively the mean and standard deviations of the Euclidean distances between the user's calculated gaze and the reference positions obtained during the user calibration. The third one is the amount of gaze movement between the previous and current image frames.

Estimation Analysis of Fuzzy Algorithm for User Gaze Tracking Accuracy Using Calibration Information
In this study, a method for estimating the user's gaze tracking accuracy was proposed by applying a fuzzy algorithm to three feature values obtained during the user calibration. From Table 3, the gaze tracing accuracy estimation results can be confirmed according to the fuzzy system's Min and Max rules and various defuzzification methods. In the results shown in Table 3, it was found that the estimation value of the fuzzy system obtained when the Min rule and COG were used had the highest correlation with the size of the real gaze error. In other words, when the estimation of the fuzzy system was small, the real gaze error was small. In addition, when the estimation of the fuzzy system was large, the real gaze error was large.    To find the correlation between the real gaze error and each method of defuzzification with the Min or Max rule, the correlation, gradient, and R 2 values were calculated, as shown in Table 4. The range in correlation values was´1 to 1, indicating a negative or positive correlation among the real data. A correlation value of zero indicates that there is no correlation between two sets of data. As shown in Table 4 and Figure 19, when the Min rule and COG were used, the correlation value was the highest between the real gaze error data and the fuzzy system's output estimation value.
Most of the two-dimensional data can be expressed through a linear fitted line, which can express the distribution, and can be calculated using the gradient and R 2 . Here, R 2 shows reliability between the data distribution and a linear line. Its value increases as the data are distributed closer to the linear fitted line [32]. As shown in Table 4, when the COG of the Min rule was used, the gradient was closest to 1 and the value of R 2 was the largest. Based on these results, it was found that using the COG method of the Min rule is best when a user is estimating a gaze error.  The following is a statistical analysis of the differences between correlation values according to the correlation value ranking and each method of defuzzification using the Min or Max rule.
As shown in Table 4 and Figure 19, the difference between the COG of the Min rule and MeOM of the Max rule, which ranked 1 and 2 on the basis of the correlation values, respectively, was checked through a t-test [33]. This is a statistical hypothesis test commonly used in statistical analysis. When the t-test was performed using two independent samples, the COG of the Min rule and MeOM of the max rule, the calculated p-value was 0.0347, which was smaller than the 95% (0.05) significance level. That is, because the null-hypothesis that there is no difference between two correlation values is rejected, it indicates that there is a significant difference in the correlation values between the two groups, i.e., the COG of the Min rule and the MeOM of the Max rule. This shows that, to estimate the gaze error of the user, it is appropriate to use the COG of the Min rule, which has the top ranking correlation value among the defuzzification methods. As shown in Figure 20b-i, in all other cases of the ranked correlation values, i.e., the No. 2 and 3 through No. 9 and 10 rankings, the p-value was larger than 0.05, which does not show a statistically significant difference. If the null-hypothesis between the samples is determined using the p-value through a t-test, the size of the difference between the two groups can be shown using the size of the effect. This can be defined as Cohen's d [34]. Cohen's d is considered as small at around 0.2-0.3, medium at around 0.5, and large at 0.8 or higher. For example, in Table 5, in the case of the COG of the Min rule and MeOM of the Max rule, Cohen's d is 1.2425. Through this, it can be stated that the difference in the correlation values is very large between the COG of the Min rule and MeOM of the Max rule, to the point of having a large effect size. It can be seen from the p-value and Cohen's d that the difference is large between the COG of the Min rule and MeOM of the Max rule, which have No. 1 and 2 ranked correlation values, respectively. Through this, when estimating the gaze position error of the user, it was found to be most appropriate to use the COG method of the Min rule.

Conclusions
In this study, a fuzzy-based estimation model was proposed to measure the accuracy of a gaze tracking system based on the user's calibration information. The accuracy of the gaze tracking system was estimated based on three types of feature values. The first and second features are respectively the mean and standard deviations of the Euclidean distances between the reference and user's gaze positions during the calibration. The third one is the amount of movement in the user's gaze position between the previous and current images. Experiments on the accuracy of various gaze tracking methods were conducted using four types of markers and three types of marker display methods during calibration. Through them, the validity of a fuzzy-based estimation system was verified using the feature values and results according to each calibration method. Through the experimental results of this study, the correlation, gradient, and R 2 were calculated between the real gaze errors and each defuzzification method using the Min or Max rule. From them, the validity of the fuzzy-based gaze error estimation system was proven. In addition, by calculating a t-test and Cohen's d between the defuzzification methods using the Min or Max rule, it was shown that the COG method of the Min rule was the most appropriate.
In the future, a follow-up study will be carried out on the estimation of gaze tracking accuracy based on a neuro-fuzzy system. In addition, the convenience of various calibration methods will be measured through user brainwaves, as well as through a subjective evaluation.