Research on the Identification of College Students’ Mental Health Problems Based on Campus Big Data

In recent years, according to the survey, college students have frequent mental health problems, such as anxiety, depression, inferiority, inter-personal sensitivity and other psychological problems, even the idea of suicide. It has a very serious negative impact on the family and society. If mental health problems of college students can be detected early, counselors can pay more attention to these high-risk students. At the same time, the high-risk students can receive therapies as soon as possible, reducing the harm. Therefore, it is crucial to find an effective method for early detection of mental health problems. This paper proposes a new method to detect mental health problems by analyzing students ’behavior on campus. The datasets include Internet access logs, records of entering and leaving dormitories, and records of consumption in canteens. We built classification models for differentiating the mental health problems of students from normal students. The experimental results show that the proposed method may be useful to improve the performance of public mental health services.


Introduction
In recent years, according to the survey, college students have frequent mental health problems, such as anxiety, depression, inferiority, inter-personal sensitivity and other psychological problems, even the idea of suicide. It has a very serious negative impact on the family and society [1]. Therefore, effective prevention of mental health problems is essential. Early detection is a basis of the prevention of mental health problems [2]. However, because of several reasons such as lacking mental health knowledge, and stigmatizing attitudes towards mental patients, A large number of people with mental health problems are not motivated to seek professional help [3], [4].
Traditionally, researchers used questionnaires to obtain the mental health status of college students. However, this method has two shortcomings: one is that the respondents may deliberately conceal the facts; the other is that we can not obtain the individual's mental health in real time. In recent years, with the rapid development of the Internet, people are increasingly relying on the Internet, providing a new opportunity to detect mental health problems. At present, a few studies have shown that there is a close relationship between mental health status and network behavior. Dong Nie et al. showed that part of searching behaviors are correlated with personality traits in some degree [5]. Ang Li et al. found that Web usage behavior can effectively identify students with mental health problems [6]. Zanganeh and Hariri showed that there was a significant relationship between emotional expressions and searchers' individual characteristics [7]. Zhu Changye et al. showed that time-frequency features from Internet access logs are effective in capturing the changes of mental health status [8].
For college students, their behavior on campus is diverse, and network behavior is only a small part of it. In this paper, the goal is to identify students with mental health problems by fusing multiple data To achieve this goal, we build a novel framework in which data sources include smart card consumption, records of entering and leaving dormitories, and Internet access logs. The whole procedure of our research mainly includes four stages. In the first stage, Internet access logs is preprocessed. We classify URLs into eight categories (Study, Entertainment, Comprehensive, Adult, Shop, News, Propaganda) and then construct a sequence of behaviors for each student according to time. Subsequently, we use CNN to extract network features from behavior sequences. In the second stage, the data of one-card consumption is processed. We judge the morning, noon and evening meals according to the consumption time, and then calculate the regularity and frequency of the three meals respectively. The k-means are used to cluster regularity and frequency, and finally the anomaly score is calculated by clustering results. In the third stage, some statistical features are extracted, including mean, standard deviation, the number of early rising and late returning, front and rear foot swipe, and so on. Finally, we use five classification algorithms to classify.

Related work
With the rapid development of the Internet, more and more people spend more time online. In order to understand web user's behaviors on the Internet, many researchers are interested in the relationship between web use behaviors and psychological features. Ozcan and Buzlu found that, increased problematic Internet use was associated with increased depressive symptoms [9]. Ceyhan found that, depressive status was an influencing factor towards problematic Internet use [10]. Peng and Liu found that, increased dependency on online gaming was associated with increased depression scores [11]. Selfhout et al. found that, increased Internet use for communication purposes was associated with decreased depression scores, while increased Internet use for non-communication purposes was associated with increased scores on both depression and social anxiety [12].These studies imply that it is rational to detect one's mental health status based on his/her web use behaviors. However, they are based on a self-reported approach to studying the relationship between network use and mental health status.
With the help of information technology, online behavior is automatically recorded in the database, which gives us an opportunity to study the relationship between actual network behavior and mental health status. Recently, a few studies have begun to use actual network behavior. Gosling et al found that, web user's personality could be manifested on actual web use behaviors [13]. Dong Nie et al. showed that part of searching behaviors are correlated with personality traits in some degree [5]. Ang Li [9]. Zanganeh and Hariri showed that there was a significant relationship between emotional expressions and searchers' individual characteristics [7]. Zhu Changye et al. showed that timefrequency features from Internet access logs are effective in capturing the changes of mental health status [8].
The behavior of college students on campus is diverse. In addition to online behavior, it also includes entering and leaving dormitories, eating in canteens and other behaviors. Therefore, in this paper, we try to identify students with mental health problems through network behavior and other campus behavior. Fig. 1 shows the framework we propose, which consists of three phases, i.e., data acquisition, feature extraction and model training and prediction. Among them, feature extraction includes three parts: web features, abnormal scores and statistical features.

Extracting web features
The relationship between network behavior and psychological characteristics has been explained in previous work. Based on this fact, we try to extract useful features from Internet access logs. The Internet access logs were collected from the Internet billing system. Each record is well formatted, as shown in Table 1. In this paper, we divide URLs into eight categories (eight browsing behaviors): Study, Entertainment, Comprehensive, Adult, Shop, News, Propaganda. Network behavior changes over time, so we set up a time series for each student. Extracting the hidden features of the time series is a meaningful way to improve the prediction performance. In the past years, the CNN has been proved to be a reliable technology to extract the hidden features, for it can complete the automatic creation of filters [3].Kong and Kim [14] analyzed the extraction of features from intercepted radar signals based on CNN, and used the extracted features to effectively classify the intercepted signals. Wen and Yang [15] achieved the comprehensive analysis of ECG recordings and radar data based on CNN algorithm. Therefore, this paper extracts network behavior features based on CNN. Different from a traditional neural network with full connection throughout each layer, a CNN significantly reduces the network parameters by local connectivity and weights sharing using convolutional layers. This core building block consists of a set of kernels (or filters) which have a small receptive field. Each kernel moves across the input volume in a specified manner performing the convolution operation. Meanwhile, the kernel parameters remain the same to control the total number of free parameters. For a convolutional layer in the lth layer, the computation is expressed as where k denotes the kernel number, c represents the channel number of the input ( −1) . ( ), is the kth convolutional kernel corresponding to the cth channel, and is the learnable bias corresponding to the kth kernel, f (· ) is the activation function and * is the elementwise multiplication.
In the pooling layer, the feature map obtained by the convolutional layer continues to carries pooling operation with the max pooling function [16]. The max pooling function uses the overall statistical characteristics of adjacent locations in the time domain to perform the feature output, then the adjacent area is subsampled to form the new feature maps, and the parameters dimension can be effectively reduced. the pooling is expressed as: , , , ( ), ̂ represent the input, the weight matrix, the biases, the down sampling function, the feature maps of the jth kernel in the lth pooling layer respectively.
In this paper, the CNN consist of two convolutional layers, two pooling layer a fully connected layer, and the channels of the convolutional layers are 10 and 20, respectively. The reLU and adam are used as the activation function and optimization algorithm, respectively. Besides, to prevent overfitting, we use three dropout layers with parameters of 0.15, 0.15 and 0.5, respectively. Its structure is shown in Fig. 2. In our experiment, 70 positive samples and 70 negative samples randomly selected were used to train the CNN model. Then, all samples are input into the trained CNN model, and the results of dense layer are used as feature output.

Extracting abnormal scores
Some studies have found that people with mental health problems have eating disorders [17], especially those with depression. Based on this fact, we can analyze students' dietary patterns through their consumption records in the canteen. We particularly focus on the daily regularity of having breakfast, having lunch and having dinner. Since there are multiple records for each meal, here we use the first time as the meal time for every meal. For example, if there are three records for a breakfast at 7:20, 7:21 and 7:22 respectively, we should use 7:20 as breakfast time. Lunch and dinner are the same as breakfast. Assume there are n time intervals T= {t1, t2,..., tn}, for any given students, the probability that a behavior v∈V= {"breakfast", "lunch", "dinner"} will take place within time interval ti is computed as where ( ) is the occurrence frequency of the behavior v within the time interval . Then the entropy of the behavior v is computed as When computing entropy, we assume that each time interval span is half an hour with respect to all three behaviors. breakfast is specified by time periods from 6:00 to 10:00. Lunch is specified by time periods from 11:00 to 13:00. Dinner is specified by time periods from 15:00 to 17:00. From equation (3) and (4), it can be concluded that the smaller the entropy of a behavior, the more concentrated the probability distribution with time interval, and the higher the regularity. However, we found that if a student seldom goes to the canteen, instead another a student regularly goes to the canteen, they will have similar entropy value. In order to distinguish between the two groups, we calculated the number of having breakfast, lunch, and dinner to the canteen for each student, respectively. Next, we use kmeans to cluster the entropy and the number of meals, with the cluster set to 3. The clustering results are shown in Fig. 3.
Suppose the smaller the cluster in which an instance is located and the further away it is from the centroid, the higher the abnormal score is. We calculate the abnormal score according to the equation below. In our experiment, we used Euclidean distance.
Where is the centroid of cluster that contains instance . represents all instances. | | and | | are the numbers of instances in clusters and , respectively. Fig. 3 Clustering. Fig. 4 Proportion of students.

Extracting abnormal scores
Existing research has demonstrated a link between social skills and mental health. So, we extracted the characteristics that reflect the social situation of a student. If a student often enters and leaves the dormitory alone, his social interaction is bad. We calculate the number that a student enters and leaves the dormitory with friends. The rule was to determine if they were roommates, the time difference was less than 20 seconds, and both entered or left the dormitory. If a student often orders takeout, it means that he seldom eats with his friends. The rule is that the time difference between two adjacent records of a student is less than two minutes. Medically, it has been proved that people with mental health problems have negative attitudes and show inactivity. If a student stays in the dormitory for a long time, it may be inactive. For example, there are fewer gatherings with friends, fewer participating clubs, less time for self-study in the library, and so on. Therefore, we calculated the time each student spent in the dormitory on weekdays and weekends respectively. Stress in learning is also one of the factors influencing mental health. We calculated the number of failed subjects for each student.
In addition to the features mentioned above, we also calculated some basic statistical characteristics, including the average and standard deviation, the number of early rising and late returning, front and rear foot swipe, the amount of consumptions, the number of consumptions, etc.

Data acquisition
We investigated 280 college students, 70 of whom were found to have mental health problems by college counselors. They were marked as mild, moderate and severe. In addition, 210 normal students were randomly selected from the whole school. Finally, the proportion of people in each mental state level is shown in Fig. 4. We only divide the samples into two classes, and the mild, moderate, and severe are positive samples. In our experiments, In order to protect students' privacy, we encrypt the student number. The datasets include following information and records: Web logs: In the campus, most students choose to use the campus network, because it is not only cheap but also free to use library resources. The university allocates acampus network account to every student. They can log on to their own account through a computer or mobile phone to connect to the campus network to access the Internet. When students access the Internet through the campus network, all URL requests go through the school server and are recorded in the log file there. In this experiment, we used the log records from April 20 to 30, 2017.
Consumption Data: In the campus, students use campus cards to buy meals or to shop. Every time a purchase is made, the time and place are recorded. Since a person's mental health status is short-term, we only use the consumption data for one month in April 2017.
Entering and Leaving Dormitory: In the campus, students use campus cards to enter or leave dormitory. When students enter or leave the dormitory, the system records the time and status. As with the consumption data, we only used the record for one month in April 2017.
Table2. The performance of the selected classification models.

Classification Results
In this experiment, our goal is to divide students into two classes. Students reported by counselors were labeled as positive, including mild, moderate and severe. Other students were labeled negative. The classification models we use include Random Forest (RT), Gradient Boosting Decision Tree (GBDT), Naive Bayes (NB), Neural Network (NN), Decision Tree (DT).
Here, to illustrate the significance of feature abnormal score (AS) and web feature extraction (WF) by CNN, we will use three different types feature in the classification models. (a) Statistical features (SF) and feature abnormal score. (b) Statistical features and web feature are extracted by CNN. (c) All features including statistical features, feature abnormal score, web feature extracted by CNN. Table 2 shows the average precision, recall and F1 in these models, which are calculated by the ten-fold cross-validation. Firstly, it is obvious that we can observe that precision is higher than recall in all models. That's because the data imbalance causes the model to be biased toward a class with more samples (negative samples).
Secondly, all features are the best predictors. Precision of all the classification models based on all features is increase significantly compared with based on statistical features and web feature extracted by CNN and based on statistical features and feature abnormal score. So, F1 of all the classification models based on all features is larger. This confirmed the effectiveness of feature abnormal score and web feature extracted by CNN.
Finally, the value of precision is the largest for RF based on all features, and the value of recall is the largest for DT based on all features. For further comparison, the value of F1 for DT based on all features is larger than other classifiers, indicating that the integrated performance measure of DT is higher than other techniques. Therefore, in our framework, we choose the DT as the classification algorithm.

Conclusions
In this paper, we propose a new method to identify the mental health problems of students. Our method has two advantages. Firstly, our method uses multi-source data to capture the various behaviors of college students. Secondly, our label is provided by counselors, which has a higher credibility than the questionnaire survey. Using three different types feature, five modeling techniques were tested to detect the mental health problems of students. Experimental result confirmed the effectiveness of feature abnormal score and web feature extracted by CNN. Meanwhile, DT show better comprehensive performance than other algorithms. So, in our framework, we select decision tree as classification algorithm.