Contact-Free Cognitive Load Recognition Based on Eye Movement

The cognitive overload not only affects the physical and mental diseases, but also affects the work efficiency and safety. Hence, the research of measuring cognitive load has been an important part of cognitive load theory. In this paper, we proposed a method to identify the state of cognitive load by using eye movement data in a noncontact manner. We designed a visual experiment to elicit human’s cognitive load as high and low state in two light intense environments and recorded the eye movement data in this whole process. Twelve salient features of the eye movement were selected by using statistic test. Algorithms for processing some features are proposed for increasing the recognition rate. Finally we used the support vector machine (SVM) to classify high and low cognitive load. The experimental results show that the method can achieve 90.25% accuracy in light controlled condition.


Introduction
In the field of cognitive psychology, cognitive load refers to the amount of mental effort being used in the working memory.Cognitive load is always related to a specific task, and people need to use the limited resources of working memory to complete this task.When the task's difficulty is controlled within a certain range, the more complex the task is, the more the cognitive resource is needed, which means the higher the cognitive load is.
In a classroom, a teacher often needs to combine the students' cognitive structure and performance to measure the students' cognitive load state, so as to adjust the teaching content and strategy.The method that the teacher uses is subjective and empirical.In a remote education system, due to the lack of the cognitive load evaluation process between teachers and students, the teaching quality may not be satisfactory.Therefore, finding a real-time and objective method of measuring cognitive load is particularly important.The application of the cognitive load measurement can not only be in the education field, but also be extended to vehicle driving [1][2][3], human-computer interaction [4,5], product design [6], and so forth.
There are three ways to measure cognitive load, which are subjective measures, behavioral measures, and physiological measures [7][8][9].Subjective measures have the advantages of noninterference and simpleness, but they need participants to adopt the method of introspection to assess the load level, which may lead to deviation.Behavioral measures are more direct.Single task measures can directly reflect the participant's cognitive efforts, but the indices are task-related.Multitask measures are more sensitive and have a higher validity, but the secondary tasks easily cause interference to the main task.Physiological measures measure the cognitive load indirectly through testing participants' physiological reactions in one task.Though behavioral and subjective measures can permit detailed inferences to be made concerning operators' mental workload [10], physiological measures can provide results with higher level of objectivity and instantaneity about cognitive load.Above all, the physical measurement is more suitable for practical application.
The common physiological signals used to measure cognitive load include HRV [11][12][13], EEG [10,13,14], GSR [15,16], and the eye movement signal [17,18].For the HRV, EEG, and GSR, participants need to wear sensors, which may not be convenient and comfortable.However, the eye movement measurement allows participants be measured without wearing any electronic equipment.It makes participants more comfortable and relaxed.
In this paper, we establish an objective method for recognizing the state of cognitive load through analyzing the data of eye movement.We used the eye tracker as acquisition equipment, selected features which are correlated to the state of cognitive load, adopted a new method to make the features more independent of individual change, and used the support vector machine to classify the high and low cognitive load status.The method can achieve high recognition rate under both the light controlled condition (90.25%) and light change condition (82.95%).

Method
Eye tracker was used to collect data in the process of experiment.Eye movement data are easily affected by light [19], so the intensity of light had been controlled in the experiment contexts.But we try to find a more robust identification method.We therefore set up two kinds of light intensity models: light controlled model and light change model.
The aim of the study was to collect a range of physiological data that would allow us to identify the subjects' cognitive load state.In a designed experiment, participants were asked to determine whether two segments are in parallel.In order to make participants concentrate more, we added another person in the task.Participants competed against each other in the task, and the winner who had the better recognition accuracy would get better pay.

Apparatus.
Experiment room is required to insulate the noise and light, and the experimenter and participants were in the independent operating areas, so that they did not disturb each other.
Experimental equipment includes three computers, connected to each other through the network; two projectors; an eye tracker and a video camera; and a light meter for recording the amount of light that impacted on the participant's eyes.
The area for the participants' testing is shown in Figure 1(a).All the computers were placed in the operating areas of the experimenter.The computer C A controlled the projector P A (3000 lm).The projector (P A) was used to provide indoor lighting.The light intensity had two models: (1) light controlled model: projector (P A) projects a square onto the white wall in front of the participants; the brightness intensity of the projected square is set as 67.1 lux; (2) light change model: the projector (P A) continues to project a square onto the wall; the brightness of the square kept changing from maximum intensity of 79.9 lux to a minimum intensity of 2.4 lux with a period of 8 s.
Projector P B was used to project the experiment material on the center of projection area of P A (the size of the experiment material area is 53 cm * 40 cm), as shown in the upper panel of Figure 1(b).The computer C B was used to run E-prime.
Eye movement data was obtained by using an EyeLink 1000 Desktop (SR Research company) in this experiment, which is an eye tracker providing a noncontact method for recording the data (the key features of EyeLink 1000 Desktop are shown in the following part).The sampling rate of the EyeLink 1000 was set to 500 Hz; only the monocular eye data was recorded.

EyeLink 1000 Desktop System's Key Features Key Features
Supports the Remote Camera Upgrade allowing Head Free-to-Move tracking Supports monocular and binocular recording No electronics near participant's head Camera-to-eye distances that are optimal between 40 and 70 cm 32 ∘ × 25 ∘ tracking range 940 nm illuminator available for dark adapted environments We only had one eye tracker and it must be placed in front of the participants, so it was unable to record two people's data at the same time.In order to make the participants ignore this problem, a camera was placed in front of another participant, to disguise it as a recording device.It is necessary to make sure that two people take part in the experiment at the same time, so anyone quits the experiment will result in deleting the data of two people.Finally we collected 24 participants' data (three groups of data were deleted, because the participants dropped out of the experiment; the fact that participants give a very low judgment accuracy may indicate lower concentration; the data of these participants were not in use either).

Experimental Design and Procedure
. Upon arrival at the lab, the participants sat in a quiet room to read and sign informed consents and filled questionnaires (in order to eliminate the effects of smoking, drugs, sleep, disease, etc.).Thereafter, they moved to the experiment room and adjusted to a comfortable sitting position.Then they would receive instructions from the projection area and have a training trial to disclose any potential issue (misunderstandings, equipment, operation, etc.).
Before the beginning of the formal task sessions, baseline physiological signals during a 30 s rest period (t0) were recorded, in which the participants relaxed and looked at a fixation point.Subsequently both of the participants would complete 8 trials, and each trial had ten judgments as shown in Figure 2(a).Judgment accuracy was shown after each trial, and then the participants had a minute of rest.
The first two trials (t1, t2) were under the light controlled model (see Figure 2(a)), and the other two trials (t3, t4) were under the light change model.In t1, player 1 was required to judge and player 2 was required to watch.In t2, the tasks of two players were switched.The operation modes of t3 and t4 are the same as those of t1 and t2.
Four trials were only the half of whole experimental processes.Because the room only had one eye tracker, we only record the data of player 1.Then the experimenter told the participants that they have completed half the experiment, so the two participants need to swap seats.The same four trials would be repeated and the data of player 2 would be recorded.The process of the experiment is shown in Figure 2(b).
Through analyzing the subjective reports of 24 participants, it is showed that the participants were in a high cognitive load state when they were doing judgment tasks and in low cognitive load state when they were watching others' operation.

Data Acquisition and Analysis
The desktop system recorded eye movement data.And three reports (sample report, fixation report, and trial report) of the data were derived by Data Viewer (SR Research).We used Matlab to calculate the features (mean, standard deviation, and variation coefficient) of each report and then selected features that have significant differences between two cognitive load states in statistical analysis.There are two problems during the whole process: (1) missing data of pupil size when the participants are blinking their eyes and (2) effect of individual differences on the features selected.
3.1.Preprocessing.Eye tracker was set to use the pupil area (the number of pixels in pupil image) to represent pupil size.However the record of pupil size will be lost if blinking happens.To solve the problem of missing data, we identified blink onset and offset instances in the pixel data and then replaced blink data by means of linear interpolation [20].There are two peaks before and after the blink (see Figure 3(a)); in order to remove the abnormal data, we used 30-60 samples before blink onset as starting point and 30-60 samples after blink offset as ending point.Finally the moving average filter was used for smoothing the data after linear interpolation (see Figure 3(c)).The preprocessing procedure is shown in Figure 3.

Feature Selection and Individual Differences Removal.
The mean, standard deviation, and variation coefficient of each report were calculated as features, and these features of 24 participants were evaluated by paired sample t-test, respectively.Under the light controlled condition, there are 10 features (Table 1) showing significant difference between the high cognitive state and low cognitive state.And also there are 9 features (Table 2) showing significant difference under the light change condition.Part of the features of Tables 1 and 2 have been found and verified by many other scholars' researches.The blink interval is positively related to mental workload, and a longer blink  interval reflects greater attention accorded by subjects to a more difficult task [21,22].The pupil diameter was used to measure the cognitive load [23].Peak value of the saccade speed was used to reflect the mental workload [24].
Some features of single participant were significantly different between the high and low cognitive load conditions, but these differences were reduced if all 24 participants are considered.Take the blink frequency as an example; the blink frequency of majority of participants in high cognitive load state was lower than that in low cognitive load state (as shown in Figure 4); however, due to the individual difference, not only some different results in some participants exist (such as participant 23 in Figure 4), but also some participant's blinking frequency in high cognitive load could be higher than that of another participant (such as participants 11 and 12 in Figure 4).Therefore, directly using the blinking frequency as a feature for classifying high and low cognitive load will not produce high recognition rate because of the individual difference effect.
To reduce the effect and make the features more useful for the recognition, we used an individual difference removal method.We will elaborate the method by using the feature of blinking frequency as an example.
We assume that the blink frequency is influenced by two aspects, that is, the cognitive load and blinking habit.Experimental environment is strictly controlled and every participant experiences the same task, so we assume further that the influence of the cognitive load on the blinking frequency is basically the same.Therefore the personal habit of blink causes the differences of blinking frequency.
Bearing the above hypothesis in mind, we observed Figure 5 that shows the blink frequency of all participants in relaxed (baseline) state and high cognitive state.Each data point represents a participant, the graph's horizontal axis ( axis) is the blinking frequency under the relaxed state when it collected in the rest period, and the vertical axis ( axis) is the blinking frequency under the high cognitive load state.These points exhibit a linear trend; the linear fitting results are  =  +  ( = 0.343,  = 0.073).
To obtain the blinking frequency only triggered by the cognitive task, we translate the line  =  +  to a data point, such as participant 1 in Figure 5.The intercept produced by the new line  1 =  1 +  1 is assumed to be the blinking frequency only triggered by the cognitive task.This assumption is based on the knowledge that, at the interception point (0,  1 ), the bilking frequency at relaxed state is zero; in other words, there is no individual difference here; everyone has zero baseline blinking frequency.
Figure 6 (middle, right) shows the box plots of blink frequency of all participants in high cognitive and low cognitive conditions.The left pairs show raw data, the middle and right pair show data after removing individual differences  through subtractive method [20] and our proposed method (coefficient method), respectively.It is seen that the features processed by the coefficient method concentrate more, have no outliers, and have the least overlap ratios under two states.These results indicate that the proposed method can remove the individual difference to a large extent.
We applied the coefficient method on 4 features, as shown in Table 3.These four features have a common characteristic, that is, the individual differences between participants affect the feature significantly, and each of feature's value has a linear correlation between the cognitive load state and relaxed state.The influence coefficients () of each feature are also given in Table 3.
For the feature of pupil size, its coefficient is close to 1.In this case, coefficient method and subtractive method [20] are basically the same, so the coefficient method can be thought of as the extension of subtractive method.
In order to verify the process of removing individual differences that can improve the recognition performance of single characteristic, the raw features and processed features of 24 participants were randomly divided into two groups as training and testing set to the support vector machine (SVM) classifier, respectively, under both the light change and light controlled conditions.The proposed method can produce high classification rates (identifying high cognitive load from low cognitive load) in both conditions.The results repeated 100 times are shown in Table 4.

Classification with SVM
SVM is a supervised learning machine proposed by Cortes and Vapnik [25].It is mainly used in small sample data.The classifier can provide the minimizing error and maximize the geometric fringe.Once given the labeled training data, the SVM can output an optimal hyper plane that categorizes new test data.SVM has been widely used in the field of humancomputer interaction and affective computing.Jang et al. [26] used the SVM as the classifier to analyze eye movement data and achieved good results.So we used the SVM as classifier and combined the operation of removing individual differences to classify the cognitive load state.
The SVM was implemented by using the LIBSVM tools [27].The SVM type was C-SVM and the kernel function was radial basis function, -c value is 1, and -g value is 0.07.
The datasets composed of 24 participants' data were used to train and test the classifiers.The details of datasets are described below: (1) Each participant has 10 features under the light controlled condition and 9 features under the light change condition.We used the SVM to complete the recognition of cognitive load states under the different light condition.
( (3) Under the light change condition, the dataset also used the same operation of (2).The structure of training datasets and test datasets was 24 * 9 (12 participants, 2 cognitive load states, and 9 features).

Result and Discussion
We sent the training dataset and testing dataset selected randomly to the SVM classifier and then got one recognition rate.We repeated the program 100 times and used the average recognition rate as the final result.In order to display the performance improvement of removing the individual differences, we used the raw features' recognition rate as a contrast.
We investigated which feature combination is the best for the highest recognition rate.The recognition rates as function of the number of features are shown in Figure 7.It is seen form Figure 7 that the combinations of 5 or 6 features can achieve the highest recognition rate (except the raw-features-lightchange condition, 6 features can get the next-best result that is only 0.77% less than the best).The recognition rate would  lower if more/fewer features are used.It may be because less features could not carry enough useful information, and more features would cause more interferences.
By using the best feature combination, the recognition rates in four conditions are given in Table 5.It is observed that the classification rate can reach as high as 90.25% in the light control condition, if the individual difference is removed.For using the raw feature, the recognition rate is 85.17%.These results support the idea that removing individual differences is able to improve recognition rate.
In practice, if collecting eye movement data under the relaxed state is allowed, then the recognition rate can be improved by removing individual differences.In most cases, the data under relaxed state cannot be obtained.By using the feature combination that we proposed, an acceptable recognition can be achieved as well.
By comparing the two experimental light environments, we find that no matter how many features and whether the features have been processed by coefficient method, the light controlled condition would have higher recognition rate.The coefficient method even does not function well in the light change condition.This phenomenon is caused by the change of light intensity.Light intensity significantly affects the eye movement features; the cognitive load information carried by the features is concealed by the effects of light intensity, which leads to the lower recognition rate.It is suggested that a constant ambient light should be maintained when this contact-free method is used.

Conclusion
In this paper, we proposed a contact-free method to recognize human's cognitive load state based on eye movement signals.
We designed experiment to trigger high and low cognitive load.The illumination environments were also set as light controlled and light change conditions.
We proposed a coefficient method to remove the individual difference effect.The experimental results proved that this method can improve the final recognition rate.
The features combination for recognition were also investigated, it is found that 5 or 6 features can achieve higher recognition rate.Under the light controlled condition, the recognition rate can reach as high as 90.25%.
The light change condition will affect the recognition rate dramatically.The variation of the light intensity will lead to the variation of eye movement, which will conceal the features resulting from cognitive load.It is thus suggested to set the light intensity as constant as possible for getting better recognition result if this method is used in practice.

Figure 1 :
Figure 1: Experimental environment.(a) The layout of participants room.(b) The projection of P A and P B.

Figure 2 :
Figure 2: The process of experiment.(a) A complete process of one trial.(b) Full process of the experiment, including 8 task-trials and two rest-trials.

Figure 3 :
Figure 3: (a) The raw pupil size data.(b) The pupil size data after linear interpolation.(c) The pupil size data after interpolation and smooth.

Figure 5 :
Figure 5: Blink frequency in high load and low load state is linear correlation.
features Remove baseline (light control) Raw features (light control) Remove baseline (light change) Raw features (light change)

Figure 7 :
Figure 7: The recognition rates under different features' number.

Table 1 :
The salient features in light controlled condition.

Table 2 :
The salient features in light change condition.

Table 3 :
The influence coefficients of four features.

Table 5 :
The recognition rates and features used.