Detection of Depression and Its High-Risk Group using Scanpath Comparison Based on Semantic Information

Background: Depression is a burdensome, recurring mental health disorder with high prevalence. The traditional detection of depression relies on structured interviews and questionnaires, which is labor-intensive and time-consuming. Also, the detection results are affected by subjective factors such as the subject's honesty and the psychologists' experiences. It lacks objective and quantitative metrics. Methods: To solve the above problems, we develop a convenient and objective system to detect depression and its high-risk group using eye-tracking data. In this system, subjects are required to answer the self-rating scale, and their scanpaths are recorded as a series of gaze points and saccades by the eye-tracking technology. Then, the similarity of scanpaths are compared and quantied with the guide of semantic information. Finally, according to the similarity scores of their scanpaths, the subjects are classied into three groups: normal, high-risk group, and depression. Results: The classication accuracy based on each item of the self-rating scale is 86.79% on average, while the detection accuracy is 95.63%. Conclusion: The experimental results show that (1)There are obvious eye movement differences among normal people, high-risk groups of depression, and depression patients while answering the questionnaire. (2)The early screening system for depression provides a novel and ecient solution to detect depression and its high-risk group by integrating traditional scale assessment and quantitative scanpath comparison algorithm.


Background
Depression is a burdensome, recurring mental health disorder with high prevalence worldwide. It can cause severe symptoms that may impact a person's ability to conduct activities of daily living. For depression, the early detection, intervention, and appropriate treatment can promote remission, prevent relapse, and reduce the emotional and nancial burden caused by the disease [1]. However, the traditional approaches for detecting depression rely on structured interviews and questionnaires, which is a laborintensive and time-consuming process. Moreover, these traditional methods' diagnostic results usually depend on the psychologists' experiences and the subjects' honesty. That is, traditional methods are susceptible to subjective factors. In this condition, an objective indicator is required to enhance traditional psychological assessment performance for detecting depression. In recent years, eye-track technology has been adopted in psychology to measure cognitive processes since the 1970s [2]. However, it has not been widely used for research purposes until recently, when the reduced cost of the equipment and userfriendly analysis tools made eye-tracking technology more readily available to researchers [3]. There is a close relationship between visual stimuli and attentional mechanisms to understand a subject's mental load and cognitive state by the eye movement data [4]. The scanpath has perfect information storage performance for eye movement, no matter in the spatial or time. It cannot only re ect where and how long the subjects have looked but also the sequence of the subjects' information processing. Compared with xations, the scanpath is more suitable for quantifying eye movements' dynamics in visual behaviour [5].
Early to 1971, Spitz et.al. proposed the Scanpath Theory, which de nes the scanpath as a speci c eye movement sequence performed by humans in a speci c mood feature-to-feature follows a certain rule, whether it is in observation mode or recognition mode [6]. In 1997, Brandt et al. further quanti ed the scanpath into a series of gaze points and saccades with a time-sequential relationship and proved that repeated viewing of original images does not signi cantly change the repeatability of the scanpath [7].
The scanpath provides a dynamic trace of attention direction. It has the potential to reduce the impact of dishonesty on the assessment results. In this decade, several methods for scanpath comparison have been successfully applied in cognitive studies of visual information processing, such as scene perception [8], reading [9], and visual searching [10]. Scanpath comparison can be used to understand differences in eye movements between correct and incorrect solvers on physics problems [11]. Also, Novices and experts could be distinguished by the comparison of their scanpath [12] [13]. In addition, the scanpth similarity comparison might clinically help provide evidence diagnosing children with mild or moderate autism spectrum disorder (ASD) [14].
Inspired by the previous works, we aim to study the potential visual cognitive differences of subjects by analyzing the scanpath in psychological assessment. In this study, we develop a system that combines the traditional self-rating scale and quantitative scanpath comparison, to detect depression and its highrisk group. It is of considerable signi cance to convert a subjective psychological status analysis into an objective scanpath pattern recognition. It could make the psychological assessment more objective and convenient.

Methods
In this study, we develop a system to enhance depression detection performance based on the psychological assessment using a self-rating scale. The self-rating high-risk of depression scale is used as a stimulus to obtain the subjects' eye-tracking data, and then the scanpath comparison algorithms are used to extract and quantify the differences of the subject's visual response to the stimulus.
Self-rating high-risk of depression scale For psychological assessment, there are many diagnostic scales for depression detection, including the Center for Epidemiologic Studies Depression Scale (CES-D) [15], Beck Depression Inventory (BDI) [16] and Self-rating Depression Scale (SDS) [17]. These scales aim to assess whether a subject meets the morbidity standards of depression and evaluate the degree of depression. They can't be used to detect the high-risk group of depression who are more comfortable with suffering depression upon stress or pressure [18]. However, to detect the high-risk group is important and valuable for early screening of depression. Therefore, our psychologists and clinicians developed a professional self-rating scale named High-risk Depression Screening Scale (S-hr-DS) for early screening of depression, especially for detecting high-risk group before suffering by the depression. S-hr-DS is based on the division and recombination of state and traits of items in some classical scales. The selected items used in S-hr-DS, which are translated into Chinese, have been tested and demonstrated for three years. Table1 lists some examples of S-hr-DS. The previous study discovered that S-hr-DS has excellent reliability (Krumbach numbers of the State Scale and Trait Scale are 0.956 and 0.962) to screen depression.

Scanpath comparison algorithms
After obtaining the subjects' eye-tracking data in psychological assessment, we use scanpath comparison algorithms to recognize subjects suffering from depression. First, the area of interest (AOI) is required to de ned for scanpath comparison. In the previous work, the grid method and the percentiles method are commonly used to de ne AOIs. The gridded-AOI approach divides the stimulus into regular bins shown as Fig.1(a). Otherwise, the percentiles-AOI approach divides the stimulus into bins with different sizes but containing the same number of gaze points shown as Fig.1(b). Both the gridded-AOI and percentiles-AOI approaches are very convenient and maintain the sequential order, shape, and the length of the scanpath. However, these approaches destroy the integrity of the stimulus by simple separation. These two approaches also ignore the similarity of the stimulus. For example, these two approaches could divide a word into two different AOIs, or separate an express with integrated semantic information into several AOIs. These two approaches emphasize the relevance of spatial position too much to ignore the essential semantic information.
The semantic information can affect subjects' attention allocation independently, and the semantic information can override low-level features when guiding attention [23], even if it is task-irrelevant [24].
Hence, we propose the scanpath comparison algorithm based on semantic information. Then, we can extract the differences among the scanpaths of varied subjects with semantic information guidance.
Besides gridded-AOI and percentiles-AOI, we de ne the semantic-AOI on each item of S-hr-DS according to Chinese grammar. We scan the Chinese sentence from left to right and adopt a dictionary-based Chinese word segmentation method for segmenting words [25]. When nding a word in the dictionary, we identify this word as an AOI. When nding a compound word (e.g. "Sichuan University"), we recognize the matched longest word as an AOI. When encountering an unknown string of Chinese characters, we split it into single characters and de ne a single character as an AOI. So that, we can adaptively extract the AOIs based on semantic information, as shown in Fig.1(c). The AOIs are marked by letters, and then the gaze points located in the same AOI are relabeled by the letter of this AOI. The scanpath can be represented as a string that maintains spatiotemporal information on how a subject views a stimulus relative to AOI [26].
Next, we respectively select the Needleman-Wunsch algorithm [27] as well as SubsMatch method [28] for comparing the strings. Needleman-Wunsch algorithm is a global string alignment approach, which is usually used to analyze DNA sequence in bioinformatics. The pairwise strings are compared by maximizing the similarity score computed from a substitution matrix that provides the score for all letterpairs substitutions and a penalty gap. The substitution matrix provides the positional relationship between the character-encoded AOIs. It can temporarily align the most similar parts of the two scanpaths according to the backtracking path of the maximum similarity score. The scores in the substitution matrix are inversely related to the Euclidean distance between AOIs [29]. In this study, we customize the distance between AOIs. An item is made up of questions, options, and blank areas. The distance between the AOIs in the question area is calculated based on Euclidean distance, and the "maximum value" represents the maximum Euclidean distance calculated in this area. Then we de ned the distance between the AOIs of question area and the AOIs of options area as the maximum value plus one unit, and the distance between the AOIs of all text part (question area and options area) and the AOIs of blank space as the maximum value add two units. In this case, we can align meaningful characters as much as possible and emphasize the differentiation of the answer options. SubsMatch is a scanpath comparison algorithm based on the frequency of repeated gaze patterns. We split the string representation into equal-length subsequences with a particular size window, and calculate the number of occurrences of each subsequence. Then, we compare the frequency difference of each subsequence between the pairwise scanpaths by calculating the similarity between scanpaths as the normalized sum of differences between all subsequence frequencies.
Finally, the nearest neighbor classi cation algorithm is performed on the similarity matrix obtained by comparing all subjects' eye movement scanpaths on the same item. Each subject has a classi cation result on each item. The subjects' classi cation on each item of self-rating high-risk of depression scale is counted and expressed in a radar chart. The nal point of the radar chart is the detection result of the subject.

Experiment
In the early depression screening system, we record the subjects' eye-tracking data by the Tobii T60 (60 Hz) Eye Tracker during the psychological assessment. The experiment aimed to validate the scanpath comparison methods in detecting cognitive symptoms of depression.

Participants
Sixty-one subjects (

Data Acquirement
We make 62 slices from 62 items in the S-hr-DS as the stimulus in the early screening system for depression and insert a calibration slice before each of the above slices, as shown in Fig. 2. Each subject sits at a distance of 60 cm from the screen and keeping his head as still as possible. After eye position calibration, the subject read the content of the slices and click "Yes" or "No" option to answer the question by mouse. There is no time limit for answering each question, click the left mouse button to enter the next slice. When the subject read and answer the question, we simultaneously record their eye movement data by the Tobii T60 (60 Hz) Eye Tracker. Figure 2 Outline of the experimental sessions. The subject need to read the introduction and click the mouse to proceed; after a calibration, the subject need to read and answer the question and click the selected option.

Data analysis
We exclude the eye-tracking data from the slices where the tracking ratio was below 80%. We select the mean values instead of single missing data. Moreover, we exclude the participants who have missed eyetracking data for more than 10 slices. We also exclude the slices which are missed by more than 10 participants. Finally, we get the eye-tracking data of 61 participants with 59 slices.
We evaluate the data characterization and scanpath comparison based on the obtained data. Firstly, we characterize the eye-tracking data by gridded-AOI, percentiles-AOI, and semantic information, respectively.
Secondly, we calculate the similarity of scanpath between two subjects when viewing the identical slice by the Needleman-Wunsch algorithm and SubsMatch algorithm. Hence, there are four calculation models for scanpath comparison: Needleman-Wunsch_Grid, Needleman-Wunsch_Semantic, SubsMatch_percentiles, and SubsMatch_Semantic. Next, we run the 1-nearest neighbor classi cation algorithm on each calculation model's results to classify the normal, high-risk group of depression, and the depression. Furthermore, we compare each model's classi cation results based on the evaluation metrics including macroACC, macroP, macroR, and macroF1, which are calculated based on the parameters listed in Table2, as given in (1)-(4). These metrics respectively indicate the macroscopically accuracy, precision, recall and F1-score of the multi-classi er.

Results
We aim to determine whether the scanpath comparison algorithm effectively detects depression and its high-risk groups. We evaluated the scanpath comparison algorithms in the same condition. Since the numbers of subjects in the three categories are almost equal, the guess chance level is 33.33%. We run a multiple-pairwise Needleman-Wunsch alignment on the eye-tracking data for grid sizes from 4⋅4 to 12⋅12. The most optimal grid size was 6⋅4 width and height, respectively. In the SubsMatch-Percentiles model, we run the SubsMatch algorithm for alphabet size from 2 to 26 and the window size from 2 to 26. The optimal parameter combination is 6 for alphabet size and 2 for window size. Also, the optimal window size was 2 in the SubsMatch-Semantic model.  Table 3 lists the macroACC, macroP, macroR, and macroF1 for comparing the performance of scanpath comparison models. The classi cation results of the four models are all much higher than the chance level of 33.3%. It is reasonable to use scanpath comparison algorithm to detect the depression and its high-risk group. Whether for the Needleman-Wunsch algorithm or SubsMatch, the scanpath comparison algorithms based on semantic information obtain better evaluation metrics. Figure 3 shows detailed confusion matrices of classi cation. For all the four scanpath comparison models, the depression has the highest classi cation accuracy, and almost all subjects of depression are classi ed accurately. However, the classi cation accuracy of high-risk groups of depression and normal subjects are quite different between different models. The classi cation accuracy of the high-risk group of depression is 80.67% and 90.37% in the scanpath comparison models based on semantic information, but only 71.87% and 63.60% in the scanpath comparison model based on grid or percentile. Besides, the classi cation accuracy of the normal is lowest. For distinguishing the high-risk group of depression from normal, the scanpath comparison model based on semantic information performs better than the model based on the grid or percentile. With the Needman-Wunsch algorithm, the difference between the classi cation results of the high-risk group's scanpaths based on semantic information and the classi cation of grid-based scanpaths was 10.80%, with SubsMatch algorithm, the difference between the two reached 26.77%. As shown in the results, there is a high false positive, so that the depression can be accurately classi ed. The high-risk group of depression is more likely to be classi ed as depression than normal group, and the normal is easily misclassi ed as the high-risk group. When answering the questionnaire, the depression has distinct eye movement behaviors that differed from those of the normal and the high-risk group. The early screening system for depression can accurately identify. While it is di cult to distinguish between the normal and the high-risk group by the traditional method, the early screening system for depression could separate high-risk group of depression from the normal. This system can bene t the early screening for depression.
We calculate the similarity scores of the scanpath within and between the groups, as shown in Fig. 4. The similarity scores of the scanpath within the depression are highest. On the contrary, the difference between the normal and the high-risk group is slight. The distribution of similarity scores of the normal is more concentrated and slightly higher than that of the high-risk group. We reviewed the distribution of gaze points when subjects were answering the S-hr-DS. The distribution of gaze points on each slice is roughly the same as Fig. 5.
Detection performance of a whole scale For each subject, we get the classi cation results on every item, and then choose the classi cation result with the most occurrences as the nal detection result. An example comparison of 3 subjects respectively from the normal, high-risk and depression groups is shown in Fig. 6. Table 3 shows the performance of depression detection by four models. Based on a whole scale, all four models demonstrate a signi cant improved classi cation accuracy. In particular, the Needleman-Wunsch_Grid algorithm has the highest classi cation accuracy of 96.72% for a whole scale, while obtaining a relatively lower average classi cation accuracy of 79.34% for one item. This result shows that the Needleman-Wunsch_Grid algorithm has a better performance in some items. Therefore, the Needleman-Wunsch_Grid algorithm has high requirements for the scale composition when used in the early screening system for depression.
While the SubsMatch_Semantic algorithm still performs well, with second highest accuracy for a whole scale. It is robust and anti-disturbance for early screening.

Discussion
In this study, we aim to detect depression and its high-risk group by combining the scanpath comparison algorithm and the scale assessment. The experiment results show that it is feasible to carry out an early screening of the depression and the high-risk group through the scanpath comparison. Scanpath comparison algorithms based on semantic information lead to a higher classi cation accuracy and a stronger robustness. However, these results are derived from a small sample dataset. We will conduct a large sample dataset to evaluate the proposed early depression screening system in the future. With the same stimulus, there are obvious differences of eye movements among the normal, the depression, and its high-risk group. The emotional symptoms of depression patients often pronounced, while cognitive symptoms are not visible. But the experiments found that patients have already experienced cognitive symptoms in the early stages of depression, which has affected their attention mechanism. And the detection of depression and its high-risk group by scanpath comparison based on semantic information is more convenient and objective than traditional methods. It could quantify subjective mental status analysis as a data indicator, to carry out a large-scale early screening of depression in a short time.
Besides, we nd that the normal will spend more time on keywords than the depression and the high-risk group. The normal rarely looks back and answers questions casually. But the distinct eye movement behavior features do not occur on all items, only appear on the items with prominent keywords. In this condition, we will search for the semantic cognitive features of the depression in the scanpath, and optimize the algorithms for the early screening of depression in future work.

Conclusion
We have established an early screening system for depression based on the questionnaire response and the comparison of eye scanpaths. The system's detection accuracy for normal people, high-risk groups of depression, and patients with depression is as high as 95.63%. The system converts a subjective psychological status analysis into an objective scanpath pattern recognition. It provides an e cient and quanti able solution to detect depression and its high-risk group.

Declarations
Ethics approval and consent to participate This study was approved by the Research Ethics Committee of Air Force Medical University.

Consent for publication
Not applicable.

Availability of data and materials
The datasets generated and analyzed during the current study are not publicly available due to no permission to share data. Still, they are available from the corresponding author on a reasonable request.   Outline of the experimental sessions. The subject need to read the introduction and click the mouse to proceed; after a calibration, the subject need to read and answer the question and click the selected option.

Figure 3
Confusion Matrix of scanpath comparison algorithms on normal people, high-risk people and depression patient.

Figure 4
Distribution of similarity scores of the scanpath within and between groups. N: normal, H: high-risk group of depression, D: depression patient. The gaze-point distributions of the subjects from different group in the 20th item of S-hr-DS.

Figure 6
The detection results of 3 subjects randomly selected from the normal people, high-risk groups of depression, and depression patients.