Abstract

For most current sentiment analysis models, it is difficult to capture the complex semantic and grammatical information in the text, and they are not fully applicable to the analysis of student sentiments. A novel student text sentiment analysis model using the convolutional neural network with the bidirectional gated recurrent unit and an attention mechanism, called CNN-BiGRU-AT model, is proposed. Firstly, the text is divided into multiple sentences, and the convolutional neural network (CNN) is used to extract n-gram information of different granularities from each sentence to construct a sentence-level feature representation. Then, the sentences are sequentially integrated through the bidirectional gated recurrent unit (BiGRU) to extract the contextual semantic information features of the text. Finally, an attention mechanism is added to the CNN-BiGRU model, and different learning weights are applied to the model by calculating the attention score. The top-down text features of “word-sentence-text” are input into the softmax classifier to realize sentiment classification. Based on the weibo_senti_100 k dataset, the proposed model is experimentally demonstrated. The results show that the accuracy rate and recall rate of its classification mostly exceed 0.9, and the F1 value is not lower than 0.8, which are better than the results of other models. The proposed model can provide a certain reference for the related students’ text sentiment analysis research.

1. Introduction

In recent years, Internet social networking, especially mobile Internet social networking platforms, has rapidly emerged around the world, and online social platforms such as Facebook and Weibo have emerged. People tend to express their own opinions on events at online social media. The student group especially occupies a large proportion of the Internet social platform users [1]. People’s evaluation often contains emotional tendencies. If you can collect these comment texts and analyze the emotional tendencies of the student groups, you can understand their emotional state for certain events and provide strong support for subsequent decision-making [2, 3]. With the continuous expansion of social network user groups, tens of thousands of data are generated per second on social platforms. However, it is extremely unrealistic to collect, query, analyze, and count this massive data information manually. Therefore, exploring an efficient and reliable way of student sentiment analysis has become a research focus [4].

With the development of statistics and computers, it has become possible for computer automation to quickly obtain text data generated by social platforms in real time. At the same time, the text can be counted, sorted, and analyzed [5]. At present, researchers have constructed many corpora for different text analysis tasks. Text segmentation is performed by the possibility of searching for words in the corpus [6]. After the computer performs basic word segmentation on these texts, its powerful computing power can be used to mine the emotions contained in the text. The existing methods for sentiment analysis research mainly include dictionary-based methods, traditional machine learning-based methods, and deep learning-based methods. The method based on the sentiment dictionary needs to construct the sentiment dictionary manually, which is not only time consuming but also laborious. Furthermore, the sentiment dictionary for a certain dataset may not be suitable for sentiment analysis tasks of other datasets [7, 8]. The traditional machine learning models usually need to manually or automatically select features, and then we can apply these machine learning models to classify the test data using these selected features [9, 10]. Not only does this method require a large amount of data as a basis, but also the characteristics of a certain dataset may not be adapted to other datasets. Deep learning technology can obtain the deep semantic and grammatical features of the text through the nonlinear learning of the neural network and achieve the purpose of automatically extracting features by the machine [11].

However, sentiment analysis has developed from the initial two categories of emotions to multiple categories of emotions. Most existing deep learning technologies cannot meet the requirements of high accuracy, and there is little research on student text sentiment analysis. To this end, a student text sentiment analysis model using the convolutional neural network-bidirectional gated recurrent unit-attention mechanism (CNN-BiGRU-AT) model is proposed. Compared with the traditional text sentiment analysis model, its innovations are summarized as follows:(1)Existing research often uses hand-designed feature extraction methods to extract text features, which cannot capture the complex language phenomena in the text. The proposed model uses a combination of CNN and BiGRU to automatically learn the deep semantic information of the text from a large amount of data, which further ensures the accuracy of sentiment classification.(2)Aiming at the problem that the feature matrix dimension of CNN-BiGRU model is too large and easy to overfit, the proposed model introduces an attention mechanism. Through a perception function, the target matrix and the weight matrix in the deep learning model are connected to strengthen the generalization ability of the model.(3)The proposed CNN-BiGRU-AT model has solved the problem of accuracy degradation caused by sample randomness.

Several major problems in text sentiment polarity analysis include the following: the establishment of word vectors, the establishment of associations between words, and the tendency classification of multiple sentiment dimensions in sentiment classification prediction [12]. Text sentiment analysis refers to the use of computers to automatically calculate and process human natural language texts, involving comprehensive problems in mathematics, statistics, and computational science. Its goal is to solve the problem of computer and text sentiment analysis in order to maximize the extraction of valuable information and achieve better human-computer interaction [13].

Text sentiment analysis is essentially symbolic, so people initially tried to use symbolic methods to express language, using logic-based, rule-based, and ontology-based methods [14]. According to whether a label set is required for training, text sentiment analysis algorithms can be divided into supervised learning and unsupervised learning. Supervised learning is a hot research topic, and a large number of high-quality models have emerged. Some models have also been put into practical application in the industry and have achieved good application results. Reference [15] created emotion label training data based on an emotion expression dictionary and used fastText for supervised learning to achieve accurate sentiment analysis and evaluation results. However, it did not consider the influence of different contexts on the emotional tendency of words. Reference [16] proposed a soft classification method to measure the probability of assigning information to each emotion category for factors such as blurred emotion boundaries, expression, and perceptual changes in automatic emotion detection. A supervised learning system is established to automatically classify emotions in text stream messages. However, the continuous field annotation of sentiment labels in the dataset configuration requires a high cost, which seriously affects the efficiency of analysis. Reference [17] proposed a multimodal emotion recognition system based on speech and facial images. The supervised learning methods k-nearest neighbor network and artificial neural network are used for emotion analysis, which can accurately extract facial expression features to complete emotion analysis. However, the cost of sentiment analysis is relatively high, and the analysis effect for students with strong arbitrariness is poor.

Unsupervised learning and semisupervised learning based on the two have always been the focus and difficulty of research. Recent research has increasingly focused on unsupervised and semisupervised learning algorithms. Core algorithms such as MLP, Support Vector Machine (SVM), and Logistic Regression are all trained on high-dimensional sparse feature vectors [18]. Reference [19] proposed a method of emotion recognition based on deep learning combined with semisupervised learning of long- and short-term memory. By using an appropriate amount of unlabeled datasets in parallel, the use of labeled datasets that require high training costs is minimized, and the analysis accuracy is guaranteed while the analysis efficiency is improved. However, the classification and recognition effect of some neutral words or irony words in the text data is poor. Reference [20] proposed a sentiment analysis model using sentiment dictionary and multichannel convolutional neural network. The input matrix of the emotion dictionary is constructed according to the emotion information, so that the model can learn the emotion information of the input sentence from the various feature representations in the training process. Then, the loss function is reconstructed to realize the semisupervised learning of the network, but the classification effect for colloquial and nonstandard expressions is not good. Reference [21] studied a mechanism in which the emotions in each pair of sentences conflict with each other. By observing the emotional orientation of each pair of sentences, we can identify the true value of the proposed conflict hypothesis and use the conflict matrix to identify the conflict emotion in the text and measure its characteristics. However, the use of unlabeled data is not enough, and the performance of unbalanced processing is poor. Reference [22] proposed a new attention-based label consistency (ALC) model. It makes good use of the relationship between different samples and smoothes the classes of unlabeled data by establishing a label imbalance ALC model. The model realizes more accurate text sentiment analysis. However, there is a certain degree of randomness in the text data submitted by students, and the accuracy of sentiment analysis needs to be improved.

Therefore, for the sentiment analysis of student texts, an analysis model based on CNN-BiGRU-AT is proposed. Under the premise of adapting to the randomness and nonstandardization of students’ texts, it effectively takes into account the efficiency and accuracy of analysis.

3. Sentiment Classification Model Based on the CNN-BiGRU-AT Model

3.1. Model Building

In order to implement a deep learning model for predicting the emotional distribution of students’ texts, a model that combines the attention mechanism and the convolutional gated recurrent unit is proposed, that is, the CNN-BiGRU-AT model. Its structure is shown in Figure 1.

The model first uses the word2vec tool to map the words in the text into a low-dimensional real number vector representation and build a matrix that represents the initial features of the text. Then, it is used as the input of the CNN-BiGRU-AT model. Finally, the backpropagation algorithm is used for end-to-end training to generate the final model. The model can classify students’ text emotions based on the text represented by low-dimensional real number vectors.

3.2. CNN Text Sentiment Analysis Feature Extraction

Unlike the usual way of processing the entire text as a long sentence, the proposed model divides the text into multiple sentences. Among them, the sentence-level feature representation is extracted through the convolutional layer and downsampling layer of CNN [23].

The first layer is the input layer. The maximum sentence length in the dataset is defined as the fixed length of the sentence. Each sentence is represented as a two-dimensional data matrix formed by longitudinal splicing of -dimensional word vectors. If the sentence length is less than , the missing vector is randomly initialized from the uniform Gaussian distribution (−0.25, 0.25). The two-dimensional data matrix is represented as follows:where is the concatenation operator. is the word vector corresponding to the -th word in the sentence. The three sentences , , and constitute the sentence input matrices , , and , respectively.

The second layer is a sentence feature extraction layer composed of a convolutional layer and a downsampling layer. Using the CNN model structure, multiple sets of local feature maps are extracted by multiple convolution filters in the convolution layer. Subsequently, the most representative features in each feature map are extracted in the downsampling layer, and a sentence-level feature representation is obtained.

Given a sentence input matrix , use a filter with a window size of to perform convolution operations on all consecutive word windows; namely,where represents the -th element in the feature map. is the coefficient matrix. is the bias vector. represents the convolution kernel function. represents a partial word window composed of words. When the word window gradually slides from to , a feature map is obtained:

In the downsampling layer, the max-over-time pooling method is used to sample the feature map, and the obtained feature value is

A filter with a window size in the convolutional layer can extract a local n-gram feature, and C filter structures are used by changing the window size. Each filter extracts m feature maps to fully consider the contextual information between words as much as possible. The feature maps extracted by all types of filters are subjected to the maximum pooling operation of the downsampling layer to obtain a sentence feature vector of length :where represents the (1 ≤  ≤ ) eigenvalue produced by the (1 ≤  ≤ ) type filter. For a text composed of sentence sequences, the sentence feature vector is obtained after the convolutional layer and the downsampling layer are sequentially pooled.

3.3. BiGRU Text Sentiment Analysis Feature Extraction

The basic unit of the gated recurrent neural network (GRNN) is the gated recurrent unit (GRU), which is a variant of the current popular long- and short-term memory network. At each time point, GRNN accepts a sentence input vector and combines the output vector at the previous time point to update its hidden layer node state . The iteration formula is as follows:where is a cross product operation. Reset threshold and update threshold control the information update of each hidden layer. and represent coefficient matrices. represents a bias vector, which is used to adaptively select and discard historical information that constructs the current semantics.

In this paper, BiGRU is used to extract the contextual semantic information features of the text. The direction of one GRU is the positive sequence direction of the input sequence, and the other is the reverse sequence direction of the input sequence. When feature extraction is performed on the input sequence, the GRUs in the two directions do not share the state. The state transition rules of GRU follow the transition occurrence between the same states. However, at the same moment, the output results of the GRUs in the two directions are spliced as the output of the entire BiGRU layer. This not only considers the above semantic information, but also considers the following semantic information.

3.4. Attention Mechanism

In the CNN-BiGRU model, CNN is responsible for extracting text features, and BiGRU is responsible for processing context and extracting sentence representations. BiGRU solves the problem of long-term dependence by adding forget gates, input gates, and output gates. The contextual semantic information processed by BiGRU is stored in a vector, but the length of the vector is fixed. When the input length of the initial sequence is extremely large, it is usually impossible to store all the semantic information in the vector. The contextual semantic information is limited, and it also prevents the model’s understanding ability from reaching the ideal level [24]. Therefore, consider adding a metric that can characterize the similarity to the BiGRU model. To achieve that, the more the current input is similar to the target output, the more the weight of the current input is increased, so that the current output is more dependent on the current input. By adding adaptive weights, the model can learn more features, strengthen the generalization ability, and reduce the occurrence of overfitting [25, 26].

The essence of the attention mechanism is to calculate dynamic adaptive weights based on probability distribution. It was originally proposed by the Google Mind team and was mainly used for image classification at the time. Later, Bahdanau et al. used attention in machine translation, which also achieved good results. Nevertheless, no one has applied attention in sentiment analysis tasks. Therefore, adding the attention mechanism to the proposed model is of innovative significance. The calculation process of attention is shown in Figure 2.

The weight score is an important part of the dynamic adaptive weight in the attention mechanism, and its calculation method is as follows:where is the hidden layer output, is the random initialization weight matrix, is the random initialization vector, and is the offset vector. Next, calculate the weight score as follows:

According to (8), the output vector weighted by the dynamic adaptive weight is

It can be seen that the attention model connects the target matrix with the weight matrix in the neural network through a perception function. Then, use the softmax function to normalize it to get the probability distribution.

3.5. Classification Output

The classification output layer of the proposed model chooses the softmax classifier to achieve the final emotion classification. After the attention layer assigns weights to the features output by the BiGRU layer, the results are input into the softmax classifier. The classifier outputs the final result of the integration in the form of an array. Since the research object is multiemotion classification, the content in the array represents the probability of multiple emotions in the text [27, 28]. The softmax classifier calculates the probability that a sample belongs to a certain category as follows:where represents the sample to be classified. represents one of the categories. represents the probability that the sample belongs to the -th category.

At the same time, the loss function of (11) is used, and the backpropagation algorithm is used to train and update the parameter set in the model to obtain the optimal parameters:where is the parameter set of the CNN network. is the parameter set of the BiGRU network. is the parameter set of the attention layer and the softmax layer, and is the learning rate.

4. Experiment and Analysis

In the experiment, the weibo_senti_100 k dataset (dataset address: https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/weibo_senti_100k/intro.ipynb) is used to demonstrate the proposed model. The dataset contains more than 100,000 Weibo posts, all with sentiment annotations. Furthermore, the Skip-gram model in Google’s open source word2vec is used in advance to perform unsupervised word vector learning on the 1.2 G Chinese Wikipedia corpus. The word vector dimension is set to 250, the learning rate is 0.01, and a distributed word vector representation model containing 520,000 words is generated. The learned word vector is stored in the vocabulary. ICTCLAS word segmentation tool is used to segment the experimental text. Used as the basic unit of sentence, the words are expressed in the form of corresponding word vectors. For unregistered words that do not appear in the vocabulary, a Gaussian distribution U (−0.25, 0.25) is used to randomly generate word vectors.

The filter window size of CNN is set to 3, 4, and 5, respectively. Each filter extracts 120 feature maps. The convolution kernel function selects the Rectified Linear Unit (ReLU) function. Both the hidden layer vector of the GRU and the context vector in the attention layer have a dimension of 120. The context vector is initialized in a randomly generated manner. During training, set the minibatch size to 64. Texts of similar length (the number of sentences in the text) are organized in a batch, and random gradient descent is performed on small batches of samples in disorder. The specific parameter settings of a single GRU are shown in Table 1.

4.1. Classification Result Statistics

In the experiment, the number of correctly classified texts and the number of incorrectly classified texts after the classification of each emotion category in the test set are counted. The analysis results of the eight emotions are shown in Figure 3.

It can be seen from Figure 3 that the number of misclassifications of “fear,” “surprised,” and “happy” is relatively small, accounting for less than 10% of the total test set. Compared with other types of data, the texts of these three categories usually have explicit emotional expression words, such as “terrible,” “unexpected,” “shocked,” and “support.” Therefore, in classification, the proposed model can better capture these explicit emotional expressions. However, for emotions that lack explicit expression, such as “sadness” and “no-emotion,” the proposed model has a large number of misclassifications, accounting for more than half of the total test set. Still, on the whole, the performance of the student text sentiment analysis of the proposed model is relatively ideal.

4.2. Reader Sentiment Distribution Prediction

We evaluate the ability of the proposed model to analyze students’ text emotions on the experimental dataset. That is, the performance evaluation is performed by calculating the Kullback–Leibler (KL) Divergence distance between the predicted emotion distribution and the real emotion distribution. The smaller the KL distance, the better the effect of the model in analyzing sentiment distribution. The KL distance is calculated as follows:where represents the true probability value of the student’s emotional label. represents the predicted probability value of the student’s emotional label.

In order to demonstrate the student text sentiment analysis ability of the proposed model, it is compared with the models in [15, 19] and [20]. The result is shown in Figure 4.

It can be seen from Figure 4 that, compared with other models, the CNN-BiGRU-AT model can significantly improve the prediction effect, and the KL distance is only 0.667. It uses CNN to extract text features and BiGRU to process context and extract sentence representations. At the same time, using AT to link the target matrix with the weight matrix of the neural network, the model can learn more features and strengthen the generalization ability. Reference [15] uses fastText to learn emotion tags based on emotion expression dictionaries. However, with the lack of contextual considerations, the prediction effect is not good, and the KL distance is as high as 0.925. Reference [19] proposed a semisupervised learning model based on deep learning combined with long- and short-term memory to predict emotion types. The combination of models improves the accuracy of sentiment classification prediction, with a KL distance of 0.792. Nonetheless, for some emotions that are not easy to recognize, the prediction performance is poor. Similarly, [20] uses sentiment dictionary and multichannel CNN model for sentiment analysis, and the KL distance value is similar to that of [19]. However, the prediction effect of colloquial and nonstandard expression classification needs to be improved. Through comparison, it is found that the proposed CNN-BiGRU-AT model is effective in extracting the semantic features of the text with the bottom-up hierarchical structure of “word-sentence-text.” Not only the semantic information within the sentence but also the dependency relationship between the sentences is considered. In addition, the fusion of the AT mechanism can further improve the predictive ability of the model, and the obtained predictive distribution is closest to the true emotional distribution. That is, the attention mechanism can perceive contextual information and find key text features that can more affect readers’ emotions, thereby improving the accuracy of emotion prediction.

4.3. Classification Performance Comparison

In order to demonstrate the effectiveness of the proposed CNN-BiGRU-AT model compared to [15, 19] and [20], a significance test experiment was designed. On the weibo_senti_100 k dataset, using word frequency as a feature, the four models are subjected to 10-fold cross-validation. The accuracy rate, recall rate, and F1 value of the classification results are shown in Figure 5.

It can be seen from Figure 5 that [15] uses a single fastText learning model, and the values of its three evaluation indicators are mostly distributed in the range of 0.5 to 0.6. Because the model is single and the classification of emotion types is small, the overall performance is relatively low. Reference [19] combines deep learning and long- and short-term memory network models, and [20] uses sentiment dictionary and multichannel CNN model to achieve sentiment classification, both of which use fusion models. Therefore, the performance is similar, and the values of the three indicators are mostly distributed in the range of 0.6 to 0.8. However, there is a lack of consideration of the connection between words and sentences. Therefore, the classification effect for recessive expressions is poor. The proposed model integrates CNN, BiGRU, and attention mechanism and extracts the semantic features of the text from bottom to top through “word-sentence-text.” Not only the semantic information within the sentence but also the dependency relationship between the sentences is considered. Therefore, the overall performance is the best, the accuracy rate and recall rate are mostly more than 0.9, and the F1 value is not less than 0.8. Additionally, the results of the ten sets of data have little fluctuation, and the classification of the proposed model is more stable.

5. Conclusion

With the large-scale commercial use of 5G technology and the rapid development of the mobile Internet, social media platforms generate massive amounts of user social information every day. Obtaining people’s views and emotional tendencies with high precision has become an urgent problem to be solved. To this end, a student text sentiment analysis model using the CNN-BiGRU-AT model is proposed. Among them, CNN, BiGRU, and AT are used to obtain the features of words, sentences, and texts, respectively. Through the top-down analysis of the connections between words and sentences, more features are sent to the softmax classifier to complete the classification of students’ emotions. The experiment analyzes the proposed model based on the weibo_senti_100 k dataset. The results show that the KL distance is only 0.667, and the emotion prediction performance is better. In addition, the accuracy rate and recall rate of sentiment classification mostly exceed 0.9, and the F1 value is not less than 0.8, which are better than the results of other models. The proposed model effectively improves the accuracy of student text sentiment analysis.

At present, the CNN-BiGRU-AT model is not ideal for implicit expression analysis. However, the implicit expression analysis exists in a large number of emotional texts, so we have to face this problem in the next work. To further solve this problem, the processing of emerging network languages and nonexplicit text expressions will be strengthened to further improve students’ text analysis capabilities.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the Computer Basic Education Teaching Research Project of Association of Fundamental Computing Education in Chinese Universities (2020, Project Code: 2020-AFCEC-418; 2021-2022, Project Code: 2021-AFCEC-139), the College Computer Education 2021 Project “Teaching Exploration and Practice of General Artificial Intelligence Courses in Application-Oriented Undergraduate Colleges” (Project Code: CERACU2021R10), the Research Project of Higher Education Reform of Jiangsu Province in China (Grant no. 2019JSJG582), and the Qing Lan Project of Jiangsu Province in China (Grant no. 2019).