See your mental state from your walk: Recognizing anxiety and depression through Kinect-recorded gait data

As the challenge of mental health problems such as anxiety and depression increasing today, more convenient, objective, real-time assessing techniques of mental state are in need. The Microsoft Kinect camera is a possible option for contactlessly capturing human gait, which could reflect the walkers’ mental state. So we tried to propose a novel method for monitoring individual’s anxiety and depression based on the Kinect-recorded gait pattern. In this study, after finishing the 7-item Generalized Anxiety Disorder Scale (GAD-7) and the 9-item Patient Health Questionnaire (PHQ-9), 179 participants were required to walked on the footpath naturally while shot by the Kinect cameras. Fast Fourier Transforms (FFT) were conducted to extract features from the Kinect-captured gait data after preprocessing, and different machine learning algorithms were used to train the regression models recognizing anxiety and depression levels, and the classification models detecting the cases with specific depressive symptoms. The predictive accuracies of the regression models achieved medium to large level: The correlation coefficient between predicted and questionnaire scores reached 0.51 on anxiety (by epsilon-Support Vector Regression, e-SVR) and 0.51 on depression (by Gaussian Processes, GP). The predictive accuracies could be even higher, 0.74 on anxiety (by GP) and 0.64 on depression (by GP), while training and testing the models on the female sample. The classification models also showed effectiveness on detecting the cases with some symptoms. These results demonstrate the possibility to recognize individual’s questionnaire measured anxiety/depression levels and some depressive symptoms based on Kinect-recorded gait data through machine learning method. This approach shows the potential to develop non-intrusive, low-cost methods for monitoring individuals’ mental health in real time.

Introduction laboratory, with the use of Kinect researchers were able to distinguish different daily human activities [20], identify the gait cycles in treadmill [21] and record the trace of people's simple step movements [22]. And this capability of monitoring body movements was soon used in clinical applications [23,24]. As Hondori and Khademi [25] reviewed, Kinect could bring certain benefits as a part of rehabilitation system for the patients of stroke, Parkinson's, cerebral palsy and some other neurological disorders, and have the potential to be a reliable solution for telerehabilitation [26]. Further more, Li et al. [27] tried to detect induced emotions from Kinect-recorded gait data, showing that through Kinect, researchers could not only analyze the movement itself, but also possibly recognize some motion-reflected mental states.
To establish a anxiety/depression detection method based on natural gaits using Kinect, we need to build computational models which could recognize anxiety/depression based on Kinect-recorded gaits data, rather than only find some gait features relevant to anxiety or depression. In order to reach this goal, we would try to extract low-level features from configurations directly described by values of 3D coordinate, and construct computational models using machine learning methods to automatically recognize the levels of anxiety and depression. These data-driven low-level features could not provide a high-level description of the gaits pattern of anxiety or depression, such as walking speed, arm swing, head movements, etc., but may carry more complete information which would be utilized by the computational models to detect anxiety or depression. This approach has been shown feasible in the field of affective computing [28,29].
In the present study, we hypothesized that the questionnaire measured anxiety and depression levels could be recognized based on individual's natural gaits, and the computational model could be built through machine learning methods using Kinect-recorded data. We conducted an experiment to test this hypothesis.

Participants and apparatus
In this study, we recruited 179 graduate students (100 males, 79 females) with an average age of 24.2(SD = 1.5) from the University of Chinese Academy of Sciences. All the participants enrolled in this experiment reported no physical disease or injury which affects daily walking. The experiment environment was set to similar as the one in Li et al.'s study [27], including a 6m � 1m footpath with two Kinect 2.0 cameras placed at the beginning and the end of the footpath.

Data collection procedures
After signing an institutionally approved informed consent, each participant was firstly required to complete a series of questionnaires. Besides basic demographic information, the questionnaires included the 7-item Generalized Anxiety Disorder Scale (GAD-7) [30], which asks about the states in past two weeks to calculate an anxiety score, and the 9-item Patient Health Questionnaire-Depression (PHQ-9) [31], which asks about the depressive symptoms in past two weeks to calculate a depression score. These two questionnaires are widely used as screening tools for assessing and monitoring anxiety and depression severity, to assist the clinician in making diagnosis. Both of them showed excellent internal reliability (.92 for GAD-7; .86-.89 for PHQ-9) and test-retest reliability (.83 for GAD-7; .84 for PHQ-9) [30,31]. The good sensitivity and specificity of GAD-7 for detecting anxiety disorders and of PHQ-9 for detecting depressive disorders had also been proved by many previous studies, with the usual cutpoint �10 for both the two scales [32]. All the participants finished GAD-7, while 167 valid samples with PHQ-9 scores were achieved (95 males, 72 females; Mean Age = 24.2, SD = 1.5).
Secondly, all the participants were asked to walk on the footpath, back and forth naturally as their daily performance, for two minutes with Kinect cameras continuously shooting in order to make sure of adequate high-quality gait record. The protocol had obtained permission from the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences. (approved number: H15010).

Data preprocessing
Denoise. The original record from Kinect was the 3-dimentinal accelerations of the 25 main body joints [27], by 30Hz sampling rate. The Sliding Window Gaussian Filtering [33] was firstly conducted on the original data of each body joint to remove the noise and smooth the records. With the window size as 5 and the convolution kernel c = [1, 4, 6, 4, 1]/16, the denoising process is defined as: In refers to the original time series data recorded by Kinect, and Out refers to the smoothed time series data. Fig 1 shows a segment of one single joint's X-axis data before and after denoise. The time series data processed by Gaussian Filter (Fig 1A) is obviously smoother than the original data ( Fig 1B).
Coordinate system transformation. Using the Kinect-default 3D coordinates with the camera position as the origin may cause considerable mistake in the process of gait pattern analysis, due to the different positions relative to Kinect camera of different participants during walking. As a solution, in each frame (containing 25 main body joints) we replaced the coordinate system with the position of SpineBase joint as the origin point, and the data of the rest 24 joints would be used in next steps.
Resampling. As a non-intrusive recording method, shortening the necessary recording time would increase its practical value, so we tried to select shorter time series from each participant during resampling. We divided the two minutes long recording of each Kinect into front and back segments, based on whether the participant were facing to the camera. Since the accuracy of joints tracking was better while the participant facing to the camera, the back segments and the frames of turning were dropped. Since walking is a periodical body movement, the final data we used should cover at least one cycle. Meanwhile, our feature extraction method Fast Fourier Transform required a data length equal to power of 2. So we chose 64 frames (about 2s) as the length of the final data segments used in feature extraction. We firstly cut the front segments into several 64-frame long small segments, each of which was perfectly continuous without any invalid frame. Then we randomly selected one 64-frame segment from each participant, as the final sample used in feature extraction.

Feature extraction
For feature extraction, Fast Fourier Transforms (FFT) [34] (defined as Eq (2)) were conducted. N refers to the length of the data segment, and x n (x 2 {X, Y, Z}) refers to the preprocessed gait data. We calculated the amplitude of FFT X k which converts the sampled function from its original domain (time domain) to the frequency domain for each joint axis (X, Y and Z), and got 64 amplitude coefficients from each axis as features. Then we used the Z-score function to normalize these features.

Feature selection
To minimize the error caused by redundant information and improve predictive accuracy, we conducted the Pearson correlation, one of the most commonly used methods for feature selection [35,36]. The correlation coefficients were calculated between anxiety/depression score and each feature (FFT amplitude) on each axis. Then, on each axis, we selected the 5 features with the largest absolute value of correlation coefficients, generating a total of 360 selected features (5 � 3 � 24 = 360) for each participant.

Model training
To predict the anxiety and depression scores, we trained models using five frequently used regression algorithms, i.e. Simple Linear Regression (SLR), Linear Regression(LR), epsilon-SVR (e-SVR), nu-SVR (n-SVR) and Gaussian Processes (GP), and applied 10-fold cross validation to test each model, which means that we randomly selected 10% of the sample for testing and used the rest of the sample for training, and repeated this process ten times for each model. The Pearson correlation coefficient between the predicted scores of each model and the questionnaire scores was calculated as the predictive accuracy index of each model. As each of the PHQ-9 items represents a unique symptom of depression in DSM-IV criteria, the score of each item is also helpful to assess the specific symptom of an individual besides the overall score [31]. For each item, the scale ranges from 0 (not at all) to 3(nearly every day), and we divided our samples into the symptomatic group (scoring 1-3) and non-symptomatic group (scoring 0). Then we tried to build classification models on each item to find out the cases with that symptom. We utilized the algorithms of Simple Logistic (SL), K-Star (K-S) and C-SVC (C-SVC), and tested the models through 10-fold cross validation. The precision, recall and F-measure were calculated as the measurement of the predictive accuracy, which are commonly used to evaluate classification models in machine learning: Precision is the fraction of the cases with symptom among the cases retrieved by the model; recall is the fraction of the model-retrieved symptomatic cases among all the cases with symptom; and F-measure is the harmonic mean of both precision and recall.
The model training and testing process was conducted through WEKA3.8, a tool as the collection of machine learning algorithms for data mining tasks.

Questionnaire scores of anxiety and depression
The distributions of questionnaire scores of anxiety and depression were shown in Figs 2 and 3. In our sample males showed higher anxiety scores than females (p = .039, df = 177), while the depression scores between two genders showed no significant difference (p = .442, df = 165). The anxiety and depression scores of both genders generally distributed in the relatively healthy region, while a few cases had anxious or depressive symptoms of different severities. Since we would also build the models to recognize each symptom in PHQ-9, the sampling distribution on each item of PHQ-9 were also presented in Table 1.

The recognition of anxiety levels by regression models
The predictive accuracies of the regression models on GAD-7 score were presented in Table 2. The performances of different models varied considerably. While training and testing the models using the whole sample, the correlation coefficients between predicted and questionnaire scores achieved 0.51 (by e-SVR). If building models separately on males and females, the

The recognition of depression levels by regression models
The predictive accuracies of the regression models on PHQ-9 score were presented in Table 3. The performances of different models also varied. While using the whole sample, the correlation coefficients between predicted and questionnaire scores achieved 0.51 (by GP). And if the model being trained separately by data from different genders, the predictive accuracies also changed. For males it was 0.45 (by GP), and for females it was up to 0.64 (by GP).

The detection of cases with different depressive symptoms by classification models
In Table 4, the precision, recall and F-measure of the three classification models on each symptom are presented. For some symptoms, such as Item 1, Item 2 and Item 4, the predictive accuracies were relatively high. The recall on these items achieved over 0.9 while the precision could be around 0.7 or higher, which means that the models could help us to find out more than 90% cases with these symptoms, with less than 30% false alarms. On some other symptoms, such as Item 3, Item 5, Item 6 and Item 7, our models also showed some effectiveness, especially for the models of LR and C-SVR, and their predictive accuracies varied in the whole sample, males and females. The predictive accuracies on Item 8 were low, and for Item 9, the symptomatic cases were too few to train the models.

Discussion
Our results supported the hypothesis that the individual's questionnaire measured severities of anxiety and depression could be recognized based on their natural gaits, with the predictive models built through machine learning. For both anxiety and depression, the correlation between predicted anxiety/depression scores and self-reported questionnaire scores achieved medium to large level (0.43 * 0.74). Furthermore, the classification models were effective in detecting the cases with several depressive symptoms. These results indicated two facts. First, the individual's anxiety and depression degrees did be reflected in the natural gaits, which is consistent with the previous studies revealing the gait features relevant to anxiety and depression (e.g., [7,9,11,13]). Second, our results also showed that no matter to what extent this target information in gaits could be visually inspected, it could be measured and utilized in recognition with the help of electronic devices. The effective predictive model in the current study was built through machine learning method, based on the low-level features directly extracted from the original 3D coordinates of the walker's main body joints. The high-level feature descriptors of body movements in this field often appeared to be based on subjective, qualitative evaluations [37], which restricted the practice integrating different features into one predictive model. The low-level features in our study (FFT amplitudes) may not provide any intuitive understanding of individual's gait, however, it could cover the information of target psychological aspects reflected in gaits more comprehensively. Our results showed the validity of the computing model based on the low-level features in recognizing questionnaire measured severities of anxiety and depression, and showed the potential of this data-driven approach in the field of psychometrics.
The apparent differences among the model effectiveness in detecting different depressive symptoms (Table 4) bring us more information about the usage of the predictive model. Considering both precision and recall, our classification models performed well in screening losing interest or pleasure, feeling down or depressed, and low energy. But for other symptoms like sleep problem, eating problem, feeling of failure and trouble in consideration, our models showed relatively lower effectiveness, or even no effect for recognizing moving or speaking too slow or being fidgety. Although these results may be affected by different distributions of item scores, they suggested that some of depressive symptoms are reflected in gaits more strongly than others. There are two scoring methods of PHQ-9 in clinical practice [38]: the cut-off based on summed-item scores, and the algorithm based on DSM-IV criteria, which requires a total of at least five symptoms rated as at least more than half the days except the suicidal ideation item, and also requires at least one of the first two symptoms of PHQ-9 (losing interest or pleasure; feeling down or depressed) scored as at least more than half the days. As the summed-item method is more sensitive and has been dominant in the screening of depression [38], the prediction of the total score of PHQ-9 has greater value in practice. Meanwhile, the detection of certain items provides additional information of the subject's symptom appearance, but is still not able to support the algorithm scoring method, as it is not valid for all the items.
In our study the predictive accuracies of the models trained by different machine learning algorithms also showed great disparity, which may be seen as a clue suggesting the relationships between the features we used and anxiety/depression. SLR and LR were linear regression models, while e-SVR, n-SVR and GP were nonlinear regression models. The outstanding performances of nonlinear regression models in our study implied that the relationship between the gait information and anxiety/depression was more possible to be nonlinear rather than linear. It may be one reason of that the specificity of the gait patterns relevant to anxiety and depression in previous studies were not unambiguously permitted [11].
For both anxiety and depression scores, if we trained and tested regression models on males and females separately, the predictive accuracy on females was higher than on males. In detecting the symptomatic cases, models trained on females also performed better on some symptoms, such as feeling down or depressed and trouble in concentration. Intuitively, it seemed that for females the level of their anxiety and depression could be reflected in their gaits more obviously than for males, in other words, women seemed to "express" their anxiety and depression more through natural gaits than men, especially for some symproms. As many studies shown, there existed some difference on males' and females' symptoms of anxiety/ depression (e.g., [39][40][41]). Researchers claimed that women may receive more positive reinforcement for expressing concerns toward anxiety symptoms [42], and men with depression showed impairment at lower symptoms levels than women [43], and reported consistently fewer symptoms than women [44]. These findings were consistent with the inference in our study that females' gaits could reflect their anxiety/depression better than males. Although we have not seen any reports revealing gender difference of anxiety/depression symptoms in terms of gaits, it may be valuable to conduct such comparison in future study.
As a pilot study, it is appropriate to highlight several limitations. First, the current study used questionnaire-based scales of anxiety and depression symptoms but not a clinical diagnosis of either. Although the validity of the questionnaires as a screening tool in accessing anxiety and depression severity has been well proved in literatures [32], the questionnaire score itself cannot be used as diagnosis. Second, the sample in this study was composed of graduate students rather than clinical patients. With the cutpoint �10 for the two scales [32], quite few participants achieved the level of moderate to severe anxiety or depression, which means that there were few "real" patients with anxiety or depression disorders in our sample. So the validity of our model in recognizing questionnaire scores of anxiety or depression cannot be equated with the effectiveness in clinical practice, and the diagnostic performance of the model such as the sensitivity and specificity in finding patients were not yet tested. Third, the current approach was data-driven and just built the association between low-level gait features and anxiety/depression severity. For a clear description of the relationship between those intuitively visible, high-level gait features and anxiety/depression scores, further kinesiological study is necessary. Forth, although the large correlation between the predicted and questionnaire summed-item scores showed the validity of screening depression by the model, the relationship between gaits and different symptoms relevant to depression is left as an open question. The great disparity of the accuracies in detecting different symptoms implied that not all the depression-relevant symptoms could be equally reflected in gaits. As the first step, this study mainly focused on predicting the summed-item score which is the most useful in practice. But to get a better understanding on how gaits reflect certain symptoms and then the general level of depression, it still needs more indepth analysis, such as factor analysis, in future study.
Despite those limitations due to the exploratory nature of the study, it suggests the potential in future mental health services. An individual's gait is objective and could be obtained repeatedly at any time, while requiring him/her finishing a questionnaire repeatedly and frequently is often not acceptable in practice. So our gait-based predictive model may be more suitable than questionnaires for monitoring the continuous change of anxiety/depression severity of individuals. The low volume of gait data needed and the timeliness of measurement made this method suitable for a very fast screening. In the current study, we trained and tested predictive models based on the continuous gaits data as short as 64 frames (about 2s). It means that we could possibly get enough data clips while participants naturally passing by the Kinect, and may not need to raise extra requirement of walking back and forth in practical applications. This method may also show advantages in some other situations where the use of questionnaire is restricted, such as on the population with low education level. Besides screening anxiety and depression by the predicted total scores, our classification models with high accuracy could be used to detect some certain symptoms relevant to depression, such as losing interest or pleasure, feeling down or depressed, and low energy. To reach these potentials, more future works need to continue from two aspects: First, building and testing the model with the sample of larger size and similar to the target users, such as real patients; Second, exploring the validity and availability of this method in the target scenarios, for example, the diagnostic performance of it if used as an aid to clinical judgment.
In conclusion, this study moved one step forward towards a non-intrusive, low-cost solution for real-time monitoring the metal health condition, which would be of potential value in mental health services. Our experiment demonstrated that the natural gaits could be an objective data source for measuring anxiety and depression, and the predictive models showed the effectiveness not only in recognizing the total questionnaire scores of anxiety and depression, but also in detecting some self-reported specific depressive symptoms. Though the nonpatient sample and the questionnaire-based design limited the applicability of the current model, this pilot study indicated one possible direction that is worthy of further investigation for new convenient mental health measuring methods.
Supporting information S1 Dataset. The dataset of the study. (RAR)