Sentiment Analysis Method of Network Text Based on Improved AT-BiGRU Model

In order to solve the problems existing in the current method of emotional analysis of network text, such as long training time, complex calculation, and large space cost, this paper proposes an Internet text sentiment analysis method based on the improved AT-BiGRU model. Firstly, the textblob package is imported to correct spelling errors before text preprocessing. Secondly, pad_sequences are used to fill in the input layer with a fixed length, the two-way gated recurrent network is used to extract information, and the attention mechanism is used to highlight the key information of the word vector. Finally, the GNUmemory unit is transformed, and an improved BiGRU that can adapt to the recursive network structure is constructed. .e proposed model is experimentally demonstrated on the SemEval-2014 Task 4 and SemEval-2017 Task 4 datasets. Experimental results show that the proposed model can effectively avoid the text sentiment analysis bias caused by spelling errors and prove the effectiveness of the improved AT-BiGRU model in terms of accuracy, loss rate, and iteration time.


Introduction
With the continuous development and progress of information technology, the Internet has entered a period of rapid development. People have gradually completed the transition from information acquirers to manufacturers, and they are more and more inclined to publish online shopping, news media, reading, and watching movies on online platforms. From other perspectives, these sentimental evaluations seem insignificant, but they actually contain rich emotional information. Quickly and accurately analyzing and extracting expressions of emotional tendencies from massive review data has very important reference value and research value for government public opinion monitoring, corporate market research, and personal consumption choices. Sentiment analysis [1][2][3][4] mainly refers to the use of natural language processing and computer linguistics techniques to identify and extract subjective information in the original material and to find out the bipolar views and attitudes of opinion speakers on certain topics. e most important thing in sentiment analysis research is sentiment classification technology. Emotion classification is mainly to identify and classify the emotions expressed in the text, such as positive, negative, and so on, to obtain potential information. At present, the most researched emotion classification technologies are mainly divided into the following three types. e first method is based on the emotion dictionary [5].
e method based on sentiment dictionary is mainly to segment documents, find words of different parts of speech, and calculate their corresponding scores.
is method relies too much on the emotional dictionary and has serious domain characteristics, and the effect is not ideal [6]. e second method is based on manual feature extraction. e method based on manual feature extraction is a traditional machine learning method, which requires a large amount of prelabeled data. en, machine learning algorithms such as support vector machine [7], Naive Bayes, and conditional random fields are used for emotion classification. Among them, the conditional random field is a discriminative probability learning model, which has good effects in sequence labeling, named entity recognition, and Chinese word segmentation. e third method is based on deep learning [8][9][10][11]. is type of deep neural network-based model has achieved better results than traditional classifiers in the field of sentiment analysis.
However, most of the current text sentiment analysis models do not take into account the problem of spelling errors, and different text comment data may appear different in length, which is difficult to transform into a unified word vector. At the same time, mainstream text sentiment analysis models take a long time to train and cannot fully extract text features. In response to the above problems, a network text sentiment analysis method based on the improved AT-BiGRU model is proposed. Its innovation points are as follows: (i) For spelling errors that are not considered by traditional sentiment analysis models, the textblob package is imported to correct before text preprocessing, which improves text sentiment analysis errors caused by spelling errors. (ii) We fill the input layer with pad_sequences to a fixed length, use a two-way gated recurrent network to extract information, and use the attention mechanism to highlight the key information of the word vector. e problems of complex calculation and high space cost of most text sentiment analysis models are improved. (iii) e GNU memory unit is transformed, and an improved BiGRU that can adapt to the recursive network structure is constructed, which enhances the effectiveness of the model in terms of accuracy, loss rate, and iteration time.

Related
Works. e amount of text data has experienced a development process from a small amount to a large amount. Correspondingly, when people recognize the commercial and social value of text data, text sentiment classification is a major task in the field of natural language processing (NLP), and its research methods are also undergoing an evolutionary process. Dictionary-based and machine learning-based methods belong to traditional text sentiment classification methods. With a small amount of data, using traditional sentiment classification methods can achieve good classification results. e method based on deep learning is an algorithm model that has been applied in the field of NLP after the deep learning model has achieved good results.

Method Based on Sentiment Dictionary.
Building an emotional dictionary based on emotional knowledge and using it as a tool is the traditional method of judging the emotional polarity of subjective texts. Most of the emotional dictionaries are constructed manually, and the basic principle is to summarize and organize widely used emotional words based on experience. When the text is input, it matches the content of the dictionary, looking for the emotional words in the text that overlap with the emotional dictionary and then judging the emotional polarity of the text. e emotion dictionary can be traced back to 1998. Whissell asked 148 subjects to use 5 additional words to describe terms such as mathematics, physics, television, newspaper, biology, and technology. en, it matches with sentiment words widely used in sentiment dictionary. However, no matter how the emotional dictionary is expanded and perfected, there is the boundary of a dictionary. It cannot cover all emotional expressions and the new words that appear with the development of the times cannot be included in time, which makes the accuracy of text emotional judgments low.

Method Based on Machine Learning.
Learning is a kind of continuous intelligent behavior that human beings have. At present, computers have also initially possessed this ability, namely, machine learning. e core of machine learning is learning, and how to make machines learn like humans is the focus of research in the field of machine learning. e principle of sentiment analysis of text based on machine learning is that after manually extracting text features, the computer processes the text according to a specific algorithm and then outputs sentiment classification. Compared with the method that completely relies on artificially constructing the emotional dictionary, machine learning has obvious advantages. On the one hand, it can effectively relieve the burden of labor and reduce irrational judgments. On the other hand, it can build a huge database and update the lexicon in time according to the development of the times. However, traditional machine learning methods require manual screening of emotional characteristics, which is a huge workload.

Deep Learning-Based Methods.
With the rise and application of deep learning, many researchers have begun to use deep learning to solve emotion classification problems. Among them, multilayer perceptron (MLP) [12], CNN [13], RNN [14], attention mechanism [15][16][17], and other neural network structures are widely used in text sentiment analysis tasks and can get a better semantic representation of sentences.
In reference [18], CNN was first proposed to solve the problem of NLP part of speech tagging. Yin et al. [19] proposed to apply CNN to sentiment analysis tasks and achieved good results. With the widespread application of CNN in sentiment classification, its shortcomings are becoming more and more obvious. CNN can only mine the local information of the text and lacks the capture effect of long-distance dependence. e recurrent neural network (RNN) makes up for this deficiency [19]. Compared with CNN, RNN has a memory function, can capture dynamic information in serialized data, and has achieved good results in sentiment classification tasks. Tang et al. [20] modeled the text at the text level and proposed a hierarchical RNN model. Although RNN is suitable for context processing, when dealing with long-distance dependence problems, gradient explosions will occur. In response to this problem, Hochreiter and Schmidhuber [21] proposed an LSTM model to optimize the internal structure of the RNN. Zhu et al. [22] used LSTM to model text, divide it into word sequences, and then perform sentiment classification. Traditional LSTM can only effectively use the above information, ignoring the downward information, which affects the accuracy of sentiment classification to a certain extent. Xu et al. [23] proposed an LSTM model with a caching mechanism to capture long-term emotional information, which has been widely used in existing sentiment analysis tasks. But for the emotion of a specific target, its local features are reflected in different places in the sentence, and LSTM cannot capture the weight difference of sequence features. In order to overcome this problem, the attention mechanism widely used in the field of machine translation is introduced into sentiment analysis tasks [24].
Tian et al. [25] used two-way gated recurrent unit (GRU) combined with attention mechanism to achieve better results in short text emotions. It proves that the attention mechanism can correctly pay attention to the relevant parts of the text through weight calculation, and it also improves the interpretability of the model. Delvin et al. [26] proposed a pretraining language model BERT based on the deep twoway transformer. e internal structure of the model is completely based on self-attention. It is proved that the introduction of attention mechanism can more easily capture the long-distance interdependent features in sentences. Attention can also directly connect any two words in a sentence, and the distance between long-distance dependent features is greatly shortened, which is conducive to effective use of these features. In addition, the attention mechanism also directly helps increase the parallelism of computation. Wang et al. [27] combined LSTM and attention mechanism to propose an ATAE aspect sentiment analysis model. e model integrates the information encoded in the input and adds an attention mechanism after the hidden state of LSTM output. Cheng et al. [28] proposed the HEAT model, which further uses the hierarchical attention mechanism to capture aspect information to complete the sentiment analysis of specific aspects of the sentence, thereby improving the accuracy of aspect-level sentiment analysis. Xue and Li [29] proposed a gated convolutional neural network GCAE, which adds aspect information in the convolution process to classify emotions in different aspects. Jiang et al. [30] proposed a fine-grained LSTM-CNN attention classification model, but LCA differs from this model in the following points. Firstly, the two-way LSTM was selected to more fully mine the context information. Secondly, the CNN layer in LCA performs an attention operation before pooling, which can effectively retain the local information lost by the pooling operation. Finally, LCA can be used not only for aspect sentiment analysis but also for general text classification tasks. erefore, the ability of attention to fuse the remote dependencies of LSTM and the local features of CNN can be better used to effectively improve the accuracy of emotion classification.

Proposed Improved AT-BiGRU Model
e improved AT-BiGRU overall model takes into account the spelling errors of the network evaluation text. And through the combination of the improved BiGRU model and the attention mechanism, the model training time is shortened. It solves the problems of complex calculation and high space cost of most text sentiment analysis models. e model is mainly divided into three parts: text vectorization input layer, hidden layer, and output layer. Among them, the hidden layer consists of four layers: BiGRU layer, attention layer, dropout layer, and dense layer. e word vector obtained by text preprocessing passes through the input layer and enters the BiGRU layer of the neural network to extract features. e word vector obtained by text preprocessing is extracted by input layer and neural network BiGRU layer, and then the key information of the word vector is highlighted by attention mechanism, through dropout layer to prevent over fitting, then through full connection layer, and finally into softmax layer for text emotion classification.
Compared with BiLSTM, AT-BiGRU model is faster in training, easier to capture the relationship between text contexts, and requires fewer training parameters; the attention mechanism can assign different weight information to different word vectors, highlighting the importance of words; therefore, the combination of the two can not only increase the training speed but also capture the key emotional information of the text, improve the accuracy of text classification, and more easily obtain the essential characteristics of the text. e improved AT-BiGRU model is shown in Figure 1.

Text Preprocessing.
e text form of online reviews is relatively free and has extremely unstructured characteristics. It is not possible to directly use the computer to classify the emotions of the web comment text. At this time, it is necessary to transform the text information into corresponding real number vectors for analysis and processing. is paper uses GloVe to realize the mapping of discrete text to real number space. is method is based on the statistical information of global vocabulary co-occurrence to learn word vectors, thus combining the statistical information with the local context window method.
In order to save more co-occurrence information between text vocabulary, the GloVe model constructs an approximate matrix of vocabulary co-occurrence matrix. e calculation formula is shown in formulas (1)-(3).
where X i represents the sum of words appearing in a row of matrix word i; V represents the total number of words in the Scientific Programming dictionary; X i,j represents the number of times the word j and the word i appear together in the fixed window in the training corpus; P ik represents the probability that the word k appears in the fixed window in the word i; and R i,j,k represents the relationship between the three words i, j, and k. If the value of R i,j,k is very large, it means that the words i and k are related, but the words J and K are not; if the value of R i,j,k is small, it means that the words j and k are related, but the words T and K are not. If the value of R i,j,k approaches 1, it means that the words jand k are related, the words i and k are related or the words j and k are not related, and the words i and k are not related. Compared with the original probability P ik , R i,j,k can better distinguish the relationship between words. e GloVe model constructs a function F(w i , w j , w k ′ ) to maximize the ratio close to P ik /P jk as the convergence target of the model, so that the word vector contains the information contained in the co-occurrence matrix, where w, w ′ ∈ R d are the corresponding word vectors. In the face of noise data and other words that cause an unbalanced cooccurrence relationship between words and some unreasonable co-occurrence relationships, words will be given very small weights, which is not conducive to model learning parameters. erefore, a weight equation f(X i,j ) is introduced when constructing the loss function, and the constructed loss function is shown in the following equation: where X i,j represents the number of times the words w i and w j appear together in the window; W T i represents the transposition of the word vector in the context of the word i when the word w i is used as the context; wj ′ is the word vector of j when wj is the center word of context; bi and bj ′ denote bias; V is the total number of words in the dictionary.

Algorithm Design
3.1. Improved BiGRU. In order to adapt the GRU to the recursive network structure, the ordinary GNU memory unit needs to be modified so that it can accept the input of two child nodes. In the following, this improved dual-input GRU is referred to as BiGRU for short. Assuming there is a BiGRU node unit j, the output of node j is h j , and the outputs of the left and right child nodes of node j are h L j and h R j , respectively. e calculation method of h j is shown in the following formula: where ω L , ω R are the weights corresponding to the left and right child nodes of the GRU unit, and ω L + ω R � 1. e update function of BiGRU node j is z j . e main function of the update function is to control whether the BiGRU is updated, which is similar to the function of a control gate, so it can also be called an update gate. e update gate will determine how to update the cell state according to the input vector x j and the child node output h L j , h R j . e specific calculation method of the update gate z j of the BiGRU is shown in the following formula: e calculation method of the candidate output of BiGRU node j as h j is shown in formula (7), where f represents the sigmoid function, ⊙ represents the dot multiplication operation, W and U are parameter matrices, and e reset gate r j of the BiGRU mainly controls whether to reset the memory unit. When the value of the reset function is close to 0, the GRU can effectively ignore the historical information, which can effectively prevent longterm dependence.
e specific calculation method of the reset function is shown in the following formula: e function σ is a nonlinear function, and the tanh function is usually used. For the BiGRU, h j , h j , r j , z j ∈ R d , and d are the dimensions of the input word vector, and emotion information prediction can be achieved through softmax.

Attention Mechanism.
e essential idea of the attention mechanism is shown in Figure 2: source can be assumed to be composed of a series of <Key,Value> data pairs. Keyvalue queries have three basic elements: Query, Key, and Value. e calculation process of the attention value can be summarized as follows. Firstly, obtain the weight coefficient of each Key's corresponding Value by calculating the correlation between each Query and each Key, and then perform a weighted summation of the weight and the corresponding key value. erefore, the essential idea of the attention mechanism can be described as a mapping from a query to a series of key-value pairs, which is expressed as follows: Similarity Query, Key i * Value i , (9) where L x � ‖Source‖ represents the length of the data source.  e specific calculation process of the attention mechanism can be abstracted into the three stages as shown in Figure 3.
Among them, K (Key) represents keywords, Q (Query) represents query, F represents function, V (Value) represents weight value, Sim represents similarity, a represents weight coefficient, and A (Attention Value) represents attention value.
In the first stage, the weight coefficient of each Key corresponding to Value is obtained by calculating the correlation between each Query and each Key.
In the second stage, a similar softmax function is introduced to normalize the weights, which can highlight the weights of important elements. a i is the weight coefficient corresponding to Value, and the specific calculation is shown in the following formula: In the third stage, the weight and the corresponding key value are weighted and summed to get the final attention value.

Model Structure.
e specific structure of the improved AT-BiGRU model is shown in Figure 4.

Input Layer.
e datasets of this article are SemEval-2014 Task 4 and SemEval-2017 Task 4. e input layer is mainly to preprocess the comment data. Before the formal text processing operation, considering the possible spelling errors in the comments, import the textblob package to correct the possible spelling errors. m words compose the text a of lsentences, which is a � s 1 , s 2 , . . . , s l , and the jth sentence in the sample is denoted as s j � w j1 , w j2 , . . . , w jm . Perform text vector operations to make w ∈ w a . e specific steps of text vectorization are as follows: (1) Read data and perform data cleaning.
(2) Aiming at the phenomenon that the length of the word vector is different, the data are vectorized into the form of a specified length of 400 (if the sentence length is less than the specified value, special symbols are automatically filled in the back by default; if the sentence length is greater than the specified value, the first 400 words will be retained by default, and the extra part will be truncated). (3) Initialize the data randomly and divide the training set and the test set according to 8 : 2. (4) After vectorizing the data, each comment becomes an index vector of uniform length, and each index corresponds to a word vector.
After the above four steps, the input data become the word matrix formed by the index corresponding word vector, that is, the uniform length of the word vector after processing is set to 400. Using the form of 100dimensional vector of glove.6 B.100 d, word vectors that cannot be found in glove.6 B.100 d are initialized randomly. Let c ji be the i-th word vector of the j-th sentence; a piece of comment data with a length of 400 is represented as Among them, ⊕ represents the connection operator between the word vector and the word vector, and c j1:j1000 represents the word vector matrix of the jth sentence. According to the index, each word in each comment corresponds to the word vector in glove.6 B.100 d to generate a word vector matrix.

Hidden Layer.
e calculation of the hidden layer is mainly divided into two steps: Step 1. Calculate the word vector output by the BiGRU layer. e text word vector is the input vector of the BiGRU layer, and the purpose of the BiGRU layer is mainly to extract the deep features of the text from the input text vector. e word vector of the t-th word of the j-th sentence input at time i is c ijt . After feature extraction from the BiGRU layer, the relationship between contexts can be learned more fully and semantic coding can be performed. Specific calculation formula: Step 2. Calculate the probability weight that each word vector should be assigned. In order to highlight the importance of different words to the sentiment classification of the entire text, the attention layer is introduced. e input of the attention layer is the output vector h ijt of the previous layer that has been activated by the BiGRU neural network layer. e weight coefficients are specifically calculated by the following formulas: Among them, h ijt is the output vector of the previous BiGRU neural network layer, w w is the weight coefficient, b w is the bias coefficient, and u w is the randomly initialized attention matrix. e attention mechanism matrix is the cumulative sum of the product of the different probability weights assigned by the attention mechanism and the state of each hidden layer and is obtained by using the softmax function for normalization.

Dropout Layer.
In order to avoid the occurrence of overfitting, a dropout layer is added between the attention layer and the fully connected layer. In the neural network, some nodes are randomly ignored, and nodes are randomly selected each time, which can effectively prevent the learned model from performing well on the training data and poor performance on the test data. e following describes the main workflow of the dropout layer and how it works in a specific neural network.  (2) How the Dropout Layer Works e calculation method of the (l + 1) layer of ordinary neurons: After joining the dropout layer, a probability distribution is required for the y value through Bernoulli(p). e probability distribution is (0, 1), and the calculation formula is as follows: Among them, (l + 1) represents the neuron of the (l + 1)th layer, r (l) j ∼ Bernoulli(p) represents the probability distribution of the jth neuron of the lth layer subject to the Bernoulli function, Bernoulli(p) satisfies (0, 1), w (l+1) i and b (l+1) i , respectively, represent the weight matrix and the corresponding displacement matrix of the (l + 1)-th layer, f(z (l+1) i ) represents the computer result z of the i-th hidden layer of the (l + 1)-th neural network, and the nonlinear function is represented by the f function.

Output Layer.
e input of the output layer is the output of the dense layer. e softmax function is used to calculate the input of the output layer to classify the text. e specific formula is as follows: where w 1 represents the weight coefficient matrix to be trained from the attention mechanism layer to the output layer, b 1 represents the bias to be trained, and y j is the output prediction label.
End for. y j � soft max(w 1 s ijt + b 1 ) calculated by softmax is compared with the original label, and the objective function is rough the above training steps, using formula (18), feature extraction is performed on the words from 1 to h, and the corresponding weights are assigned to the cumulative sum. e dense layer further extracts features and finally performs classification in the softmax output layer. en, the results of the multiplication of each comment tag value and log y (j) are accumulated. e sum of the accumulated values is negative, and the opposite is taken to minimize the loss and reduce the calculation error. Adam is used as the training device to make the model training and convergence faster. In the process of backward error propagation along time, the weights and offsets are adjusted and updated according to the errors until the iterations are reached or a fixed precision is reached.

Experimental Environment and Model Parameters.
e proposed model is experimentally demonstrated on the SemEval-2014 Task 4 and SemEval-2017 Task 4 datasets. e experimental environment is shown in Table 1.
e parameter settings of the improved AT-BiGRU model are shown in Table 2.

Evaluation Index.
When evaluating the performance indicators of word vectors, the confusion matrix method is used. TP means that the positive class is predicted as a positive class number, TN predicts a negative class as a negative class number, FP predicts a negative class as a positive class number, and FN means that a positive class is predicted as a negative class number. It can be represented by the confusion matrix in Table 3.
When dealing with sentiment analysis tasks, there are usually four evaluation indicators: precision, accuracy, recall, and F1 value.
Precision can be defined as e accuracy rate is expressed as the ratio of the number of correctly classified samples on the test dataset to the total number of samples. e formula is expressed as e recall rate can be expressed as F1 value is the harmonic mean value of precision rate and recall rate, expressed as It can be seen from equation (22) that the value of F1 will increase with the increase in accuracy and accuracy. Generally speaking, the accuracy rate is for the prediction result, which means the correct number of samples whose   prediction is positive. e recall rate is for the training set, which represents the number of positive examples predicted to be correct in the sample, including the positive class prediction in the sample as the positive class (TP) and the positive class prediction in the sample as the negative class (FN).

Experimental Datasets.
e experiment uses the SemEval-2014 Task 4 and SemEval-2017 Task 4 datasets. For the single-topic text sentiment classification task, the Twitter text sentiment classification task of SemEval-2017 Task 4 is adopted.
e sentiment polarity classification is mainly carried out on 12284 Twitter text data, which are divided into three categories: negative, neutral, and positive. For example, the emotional polarity expressed in the text "I really want to try the Mannequin Challenge" is positive. e experimental data required for the single-topic sentiment classification task are shown in Table 4.
For emotional tasks with different topics in the text, the restaurant dataset in SemEval-2014 Task 4 is used, which contains customer review text.
ere are three kinds of emotions: positive, neutral, and negative, and five themes {food, price, service, ambience, anecdotes/miscellaneous}. SemEval-2017 Task 4 contains Twitter comment data. e topic of the comment is extracted from the text. e emotional polarity of the topic includes positive and negative. Table 5 shows the statistics of the data required for experiments on sentiment classification tasks of different topics.

Activation Function Comparison Experiment.
In the neural network, in order to avoid pure linear combination, the activation function introduces a nonlinear factor to the neuron to improve the expressive ability of the model. e activation functions commonly used in traditional neural networks include sigmoid, tanh, ReLU, etc., and the function images are shown in Figure 5. e above three activation functions are selected for experiments, and the experimental results are shown in Table 6.
It can be seen from Table 6 that the AT-BiGRU model using the sigmoid activation function has the best accuracy and loss rate in sentiment classification tasks. Compared with the model using tanh activation function and ReLU activation function, the accuracy rate is increased by 22.65% and 11.41%, and the loss rate is decreased by 0.108 and 0.189, respectively.

Dropout Selection Experiment.
e results of the dropout selection experiment are shown in Table 7.
It can be seen from Table 7 that when dropout is set to 0.2, the model has the highest accuracy and the shortest time consumption. When the value of dropout is 0.1, the model achieves the smallest loss rate. When the value of dropout is 0.2, the difference between the loss rate and the lowest value is only 0.001. erefore, when the value of dropout is 0.2, the overall performance of the model is optimal.

Iterative Experiment.
e proposed method and the methods in [20,22,26] are trained on the training set, and comparative experiments are performed on the test set. e relationship between the accuracy on the test set and the number of iterations is shown in Figure 6.
It can be seen from Figure 6 that the overall accuracy of each model is continuously improved from bottom to top, and the accuracy of the proposed method is always higher than the other three comparison methods. Compared with the maximum pooling, the attention layer highlights the important information more quickly, the deep-level features of the extracted text, and the features that quickly converge and quickly improve the accuracy rate. e accuracy rate in the initial training is higher than that of the other three, and the training effect is better. At the same time, the overall accuracy of the proposed method changes steadily, but the accuracy may be lower in some iterations. But the accuracy rate is always higher than that of the other three comparison methods. On the whole, the accuracy curves of the proposed method and methods in [20,22] are relatively close, while the accuracy curve of reference [26] has relatively large fluctuations. It can be seen that the proposed method performs better and is more stable in extracting deep features of text. In terms of the number of iterations, it is not that the more the iterations, the higher the accuracy. Each method has its optimal number of iterations to achieve the highest accuracy. For example, the proposed method has the highest accuracy rate in the fourth iteration, while reference [22] has the highest accuracy rate in the sixth iteration. Based on the above analysis, the proposed method can effectively improve the accuracy of training data with the least number of iterations.
In addition, the change trend curve of the time required for four different methods to complete an iteration under the same experimental conditions is shown in Figure 7.
It can be seen from Figure 7 that the iteration time of each method generally does not fluctuate much, and the overall time tends to be stable. Generally, after the minimum iteration time has passed, when training again, the training time will no longer fluctuate greatly. e method in [20]   takes the shortest time to complete one iteration training, which is inseparable from the convergence speed of the training iteration of the maximum pooling layer of the convolution in the RNN model. e iteration time of the proposed method is higher than that of reference [20] but lower than the LSTM model of reference [22] and the BERT model of reference [26] because the improved AR-BiGRU model in the proposed method has the characteristics of faster calculation and fewer parameters than the LSTM model. Reference [22] takes the longest time because the LSTM neural network is relatively complicated to calculate, increasing the calculation time; while highlighting the key information, it also increases the weighted calculation time.

Model Loss Rate Comparison.
Similarly, the loss rate changes of the proposed method and the methods of 20, 22, 26 in 10 iterations are shown in Figure 8. It can be seen from Figure 8 that the loss rate of the four methods on the training set shows an overall downward trend as the number of iterations increases and eventually stabilizes. But on the test set, only the overall trend of the loss rate of the proposed method decreases as the number of iterations increases. e loss rate of the proposed method is     [20] Reference [22] Reference [26] The proposed method Reference [20] Reference [22] Reference [26] The proposed method Scientific Programming the lowest. Because it adopts the BiGRU model combined with the attention mechanism, it can solve the problem of long range and global dependence of text in sentiment classification tasks, so the model has better generalization ability. Reference [26] adopts the BERT model based on selfattention mechanism, which makes it easier to capture longdistance interdependent features in sentences. However, due to the greater dependence on sentence length, the loss rate is higher. e RNN model is used in reference [20], and the loss rate is low, about 0.24, which is about 0.02 more than the proposed method.

Time Performance Comparison.
In addition, the time consumption of different methods in the network text classification task is shown in Figure 9. It can be seen from Figure 9 that the proposed method takes a relatively short time in completing the network text classification, which is about 980 s. Reference [20] uses a single RNN with simple structure and easy training, so it takes the shortest time, about 750 s. However, the classification accuracy of this method is not high, so the overall performance is poor. e LSTM model is used in reference [22], and the BERT model is used in reference [26]. Both models are more complicated because the time to complete the text classification is longer, both exceeding 1200 s. e proposed method adopts an improved AT-BiGRU model, in which the attention mechanism can be well paralleled, which solves the problem that the cyclic neural network cannot be parallelized. erefore, the model training efficiency is improved to a certain extent.

Conclusion
e mobile client provides convenience for users to express their opinions. Now most online platforms provide browsing and comment-related functions. erefore, user comment data are tens of thousands every day and show an exponential growth model. Analysis of these data can generate huge commercial and social value.
Aiming at the problems of current mainstream sentiment analysis models, a new neural network model based on improved AT-BiGRU is proposed. Before the formal text preprocessing, the textblob package is imported to correct some possible spelling errors. We use pad_sequences technology to fill the word vector with uniform length and the embedding layer to unify the word vector into a fixed embedded layer word matrix form. e BiGRU neural network is used to fully extract the text context information, and the attention model is used to highlight the key information of the text. e proposed model is experimentally demonstrated on the SemEval-2014 Task 4 and SemEval-2017 Task 4 datasets. e experimental results show that the proposed model effectively avoids the bias of text sentiment analysis caused by spelling errors. e effectiveness of the improved AT-BiGRU model in terms of accuracy, loss rate, and iteration time is verified. In the next study, we will consider incorporating topic word information into word vectors, so as to better perform text sentiment analysis tasks for multiple topics in the text.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. Reference [20] Reference [22] Reference [26] e proposed method 10 Scientific Programming