Weibo Rumor Recognition Based on Communication and Stacking Ensemble Learning

,


Introduction
Nowadays, the Internet is full of various information like Internet rumors, malware, and fake news, and it is difficult for people to tell the truth of that information.With the development of cyberspace, the researches on Internet misinformation have gradually mainly focused on Internet rumor recognition, Internet rumor propagation, malware and virus propagation, fake news detection, and water army detection.
e research on malware and virus propagation focused on modeling the spread of malicious malware to predict the spreading behaviour of it [1], while the research on water army is mainly about detecting water army among a large number of social media users in an online topic and preventing the negative effect on the development of the public opinion [2].
e spread of rumors may affect personal reputation, invade public privacy, or cause chaos in public order, lead to group events, and endanger the stability of the country.erefore, modeling the propagation of Internet rumors in social network to help control the spread of the misinformation is of great importance [3].However, it is also vital to figure out what are Internet rumors and how to recognize them in a more effective way.
In paper [4], Peterson et al. proposed that rumor, in general, refers to an unverified account or explanation of events, circulating from person to person and pertaining to an object, event, or issue of public concern.Since then, rumors have been given some specific characteristics, such as ambiguity, transmissibility, and timeliness.With the development of the Internet, the spread of information has accelerated, and Internet rumors are derived.In paper [5], Chao et al. considered that Internet rumors were unconfirmed information transmitted by Internet users in specific ways.Most scholars believe that the spread of Internet rumors is carried out on the Internet, because the network connection is wide and arbitrary, which makes the spread of Internet rumors faster, the coverage of influence wider, and the extent of damage bigger [6].However, the existing definitions cannot accurately describe the technical characteristics of Internet rumors and other elements from the perspective of computability.
At present, on the one hand, the research on Internet rumors recognition focuses on extracting feature set that can be used to detect rumors.On the other hand, the classification modeling of Internet rumors, which does not need various features but a large amount of data, has become a research hotspot.However, feature selection of Internet rumors is more suitable for a small number of datasets.
e feature set used in current Internet rumor recognition is based on the features proposed by Castillo et al. [7] and Qazvinian et al. [8].Generally, the feature set is divided into three types: content feature, user feature, and propagation feature; sometimes it is further subdivided into time feature, network feature, and combination of the two.ese features are usually simple statistical characteristics, and the deep semantics of the text information are not mined.erefore, the recognition accuracy is affected by the lack of key features.In paper [9], Kwon et al. applied RNN to learn the deep meaning between messages.Based on the protection mechanism, Chen et al. [10] proposed a new RNN model to recognize the Internet rumor by using time sequence to obtain the potential contextual change.Although the neural network model can overcome the problem of sparse features by using continuous vectors to represent text, it has too many parameters and slow convergence speed and needs a lot of corpus.
e classification algorithms often used in rumor recognition research include SVM, Decision Tree, Naive Bayes, and Neural Networks.For example, in paper [11], Duan et al. used SVM to detect fake information from the comment perspective of the source Weibo.In paper [12], Chen et al. used the regression method to recognize the online food safety rumors.However, this method is limited by the topic type of the rumor and can only determine whether the article is related to the rumor.In paper [13], Lu et al. proposed an improved method based on the Co-Forest algorithm, which improved the accuracy of prediction for unlabelled samples, to solve the problem of data imbalance.
Due to the fact that the dataset is difficult to obtain, the current research tends to extract the statistical characteristics of text information.Moreover, new experimental methods in which features are added on the basis of previous researches make the feature dimension continuously increase and lead to the inaccurate model parameter.Besides, classical algorithms such as SVM, Decision Tree, and Naive Bayes are no longer suitable for recognizing Internet information with complex content.In specific problems and scenarios, each model has its advantages and disadvantages.
e results of recognition might be better by combining the advantages of multiple models [14].For example, in paper [15], Xie et al. proposed a high-precision EEG emotion recognition model by integrating LightGBM, XGBoost, and Random Forest.In paper [16], Duan et al. classified the sentiment of Weibo text by using the Stacking ensemble learning method, and the accuracy rate was as high as 93%.
In this paper, we give the definition of Internet rumor based on communication's 5W Formula, and three characteristics of User Credibility, Emotional Consistency, and Regional Correlation are constructed.en, we verify the validity of the feature by using Chi-square, so as to better filter the feature set.Finally, analysing the existing classification algorithms, adopting the Stacking ensemble learning, we propose a rumor recognition method combined with different models and optimized using cross validation.Finally, the experiments are conducted among different methods and datasets.e structure of this paper is as follows: the related work is introduced in the first part.e design of the Stacking model, which includes features construction and selection, and the construction of the model are described in the second part.In the third part, we conduct the experiments and analyse the results.e conclusion of this paper and the introduction of future work are given in the last part.e original training dataset is inputted into multiple primary learners in the first layer, and the first layer's prediction results are used as the input training set for the next layer of learners.Finally, the prediction results obtained in the previous layer are inputted into the final metalearners to get the final prediction result (see Figure 1).[17] is a binary classification model that uses the margin maximization strategy.It minimizes the structured risk, the empirical risk, and confidence range to improve the generalization ability.erefore, in a small statistical dataset, it can also get good statistical regularity.

Other Methods. SVM
Decision Tree J48 [8] is based on C4.5.It uses a divideand-conquer strategy and has high credibility; and its results are easy to understand.
Random Forest [18] is one of the ensemble learning methods.By combining multiple weak classifiers, the final result is obtained by voting or calculating mean value.e model has high accuracy and generalization performance.
Logistic regression [19] is one of the generalized linear models, as well as a classical classification method used to solve the optimization problem with the likelihood function as the objective function.

Weibo Rumor Recognition
3.1.Rumor Definition.Based on the existing rumor definitions, this paper combines the 5W's of communication to divide the elements of rumor transmission into communicators, content, objects, effects, and channels.e communicator could be an individual or an organization.e content is the information that the communicator wants to e object is the receiver of information or the communicator of information processing.e effect is that the audience is affected by the message sent by the communicator, causing changes in their ideas, behaviours, and so forth.
e channel is the mean by which communication is achieved.Sometimes the object of Internet rumors is also the communicator, so we combine them as the same element for analysis.
erefore, the qualitative definition of Internet rumor given in this paper is shown in Definition 1.
Definition 1. Internet rumors: internet rumors refer to the information published by Internet users through online media platforms, which is ambiguous in content, unconfirmed by the official, and to some extent harmful to the society.Its expressions include text, pictures, audio, and video.
e research object of this paper is Weibo.Weibo rumor, which can be divided into pure text, picture not matching the text, and fake images, is a type of the Internet rumors.Since most of Weibo contains text, the current recognition of Weibo rumors mainly focuses on text.In order to facilitate the recognition of Internet rumors, as shown in Definition 2, this paper gives its formal definition from computability.

Definition 2. Weibo rumors: the object of rumor recognition is a
represents the attributes of the i-th Weibo's text.e propagation feature set P i f p 1 , f p 2 , f p 3 , . . ., f p n   represents the propagation attributes of the i-th microblog.y i is the confidence value of whether m i is a rumor, and y i � f(X i ), (0 ≤ y i ≤ 1).When y i is closer to 1, the probability of m i being a rumor is higher and vice versa.

Rumor Recognition Process.
At present, the research on rumor recognition mainly focuses on feature construction, and the method of adding new features based on previous research work would make the feature dimension increase and estimation of model parameters inaccurate.erefore, using the Chi-square test to test the validity of new features could obtain a feature set that is more suitable for Internet rumors recognition.Internet rumor recognition is regarded as a binary classification problem.In this paper, it is considered to use the idea of Stacking ensemble learning to build a new classification model because each algorithm has its advantages (see Figure 2).

Feature Construction.
Combining the research studies of paper [8] and paper [20], we select 24 basic features, which are divided into content feature (CONT), user feature (USER), and propagation feature (TRAN).
e content feature includes the length of text, the number of @, the number of #, the number of question/exclamation marks, whether there are pictures or URLs, and the number of positive/negative words.e user feature includes the length of username, gender, the number of friends, the number of followers, the number of mutual followers, the number of microblog posts, the number of favourite microblogs, certification information, personal description, and user's own influence.e propagation feature includes the number of reposts, the number of comments, the number of likes, the interval between user registration time and microblog post time, and microblog's attention.
e above features are mostly statistical features; in order to recognize Weibo rumor more effectively, this paper constructs new features in three aspects, user, content, and propagation features, to mine the hidden meaning behind text information.Definition 3. User Credibility (UCRE): the user's credibility is determined by many factors.By integrating information such as the number of users' friends, followers, mutual followers, the number of microblogs posted, and certification information, the user's influence and activity are constructed to calculate the user's credibility.
e more credible a user is, the more credible the information he/she posts.e calculation formula of the user's credibility is as follows: where f influence (u i ) is the user's influence, f verified is the user's certification information, and f InfoIntegrity is whether the user's information is complete; the user's information includes username, gender, personal description, registration place, and profile photo.e greater the user's influence, the greater the impact of the microblog they posted within a certain time and space.e user's influence is mainly determined by the number of users' followers and the number of mutual followers.e calculation formula is as follows: where u i is user who posts i-th microblog, C bifollower is the number of u i 's mutual followers, and C follower is the number of u i 's followers.
Definition 4. Emotion consistency (ECON): emotion consistency is whether the sentiment of the microblog is consistent with the sentiment of the microblog's comment.
When the microblog shows a strong emotion, it may incite others' emotions; then the microblog is more likely to be a rumor.By segmenting the m i 's text and comments, we obtain the text's term vector set , where v i is the processed word, and the i-th microblog's j-th comment's term vector set C i j c 1 , c 2 , c 3 , . . ., c n  , where c i is the processed word.
e number of positive/negative words is calculated by using Affective Lexical Ontology [21].e specific formula is as follows: where S is the emotion of the term vector set, C pos is the number of positive words, and C neg is the number of negative words.en we can get the final emotion SO as follows: where 1 represents positive, −1 represents negative, and 0 represents neutral.Calculating the emotion of each comment, the overall emotion of the comments is calculated as follows: Comparing the emotions of m i and c i , the emotion consistency of m i is calculated as follows: Definition 5. Regional correlation (RCO): regional correlation refers to the distance between the place mentioned in the microblog and the user's registration place.e longer the distance is, the less credible the microblog is. is paper uses Euclidean distance to calculate the distance.e formula is as follows: where dist(x, y) is the distance between city x and city y, the coordinate of city x is (x 1 , x 2 ), and the coordinate of city y is (y 1 , y 2 ).Calculating the distance between cities in China, the distance matrix is shown as follows: According to the difference between the user registration place and the place mentioned by microblog, it can be divided into 4 cases: ① Both the user registration place and the place mentioned in microblog are in China.② e user registration place is in China, but the place mentioned in microblog is not.③ e user registration place is not in China, but the place mentioned in microblog is.④ Neither the user registration place nor the place mentioned in microblog is in China.Since most of Weibo rumors occur in China, the current research is mainly focused on case ①.In cases ②, ③, and ④, the distance would be set to 10000, which indicates the maximum threshold.

Feature Selection.
In order to test the validity of the basic features and new features, we use Chi-square test to obtain the feature ranking results, as shown in Table 1.
As shown in Table 1, the Regional Correlation, Emotional Consistency, and User Credibility are ranked 3rd, 5th, and 9th, so the three new features we constructed are valid.
Two sets of control experiments are conducted on different models.One is that which is based on the new features and adds features one by one according to the features ranking results.In the other experiment, feature sets are added one by one according to the features ranking results.
e experimental results are shown in Figures 3 and 4.
As illustrated in Figures 3 and 4, as the number of features in the feature set increases, the model's recognition accuracy gradually increases, but when the features exceed a certain number, the model's recognition accuracy tends to decrease.
In Figure 3, the accuracy of Naive Bayes is highest when the number of features is increased to 15; and when the number of features is increased from 3 to 12, the accuracy of SVM is significantly higher than that of Decision Tree.When the number of added features exceeds 12, the accuracy of Decision Tree starts to exceed SVM.As the number of features continues to increase, the accuracy of the Random Forest model continues to increase, but the accuracy decreases as the number of features exceeds 21.In general, the results of each model are the best when the number of features is around 13-14.
In Figure 4, the result is not much different from the result of Figure 3. e number of features in the feature set with better results is mostly around 16.In summary, we use the first 16 features shown in Table 1 as the feature set.e final feature set used in rumor recognition is shown in Figure 5.

Classification Algorithm.
e Stacking method is adopted as a combination strategy of ensemble learning in this paper.We select SVM, Random Forest, and Naive Bayes as the primary learners and logistic regression as the metalearner.SVM uses the hinge loss function to calculate surrogate loss, which makes it sparse.At the same time, it considers the minimization of the empirical risk and the structural risk to make it stable [22], so it has better generalization ability and a smaller computational cost when using kernel function [17].Random Forest can estimate missing data and balance errors for imbalanced data [18].When the correlation between attributes is small, the performance of the NB model is better.e model construction is shown in Figure 6.  e specific algorithm is described in Algorithm 1.While time complexities of random forest and logistic regression model are Ο(n log n) and Ο(n * k + k) (where k is the number of features), that of SVM determined by the kernel function and Naive Bayes can reach Ο(n 3 ).According to the strategy of the Stacking model, its time complexity equals the maximum value among the primary learners and metalearner.erefore, the time complexity of Algorithm 1 is Ο(n 3 ).

Experiment and Analysis
4.1.Dataset.We use the data from Ma et al. [23], which contain 2313 rumor events and 2351 nonrumor events, about 3.8 million pieces of microblog information, and 2.7 million pieces of user information.In the experiment, we split the dataset into training set and test set according to the proportion of 8 : 2. At the same time, in order to verify the validity of the method in this paper on actual network data, we collected data on the Weibo platform and established an empirical database.e datasets used for empirical study in this paper are shown in Table 2.

Algorithm for Comparison.
To verify the validity of the method proposed in this paper, we compare the following methods with our models: tanh-RNN [23], the method used in the paper where the data source is; SVM [20], the first method used in Weibo rumor recognition; Decision Tree J48 [8], the first method used in Twitter fake information recognition; AdaBoost and Random Forest, representative ensemble learning methods; and the method proposed in this paper.SVM and Decision Tree J48 are usually used as a benchmark for Internet rumor recognition in most research works.The number of mutual followers The number of followers Time span

Regional correlation User Influence
The number of reports The number of likes  3 (F CONT is the Content Features set, F USER is the User Features set, F PROP is the Propagation Features set, UCRE is the User Credibility, ECON is the Emotional Consistency, ROC is the Regional Correlation, and F SIFT is the Features Set shown in Figure 5).Table 3 shows that the accuracy rate of using F CONT for Weibo rumor recognition is as low as 70%, which indicates that it is difficult to detect rumors in the more complicated content.Compared with only relying on F CONT for recognition, the recognition results by using F USER and F PROP are better; in particular, the accuracy is improved by 20%.In the experimental results of F CONT+TRAN , F USER+TRAN , and F CONT+USER , we can see that their accuracy is 0.2%, 1.9%, and 2.6% lower than that of F SIFT , respectively.e accuracy of F CONF+SENT+LOC , which is composed of only three new features constructed in this paper, is as high as 90.8%, which shows that the three new features of User Credibility, Emotional Consistency, and Regional Correlation constructed in this paper have good effect on Weibo rumor recognition.However, the experimental results of F SIFT−CONF−SENT−LOC without three new features are lower than those of F SIFT , and the accuracy of F SIFT is 93%.e accuracy and recall rate of rumor and nonrumor recognition are all over 90%, and the F1-score is also stable at 93%. e feature set proposed in this paper has higher values than other feature sets, which indicates that our feature set is more effective to detect Weibo rumors.
In order to validate the effectiveness of each feature in F SIFT selected in this paper,applying the rumor recognition method based on Stacking, we conduct 16 different experiments on F SIFT with one feature removed every time.e results are shown in Table 4.
As is shown in Tables 3 and 4, the accuracy in each experiment is lower than that of F SIFT ; and the rumor recognition accuracy based on the feature set without the regional correlation is the lowest, which indicates that the Regional Correlation has the biggest impact on the recognition results.In conclusion, the 16 features selected in this paper have a positive effect on rumor recognition.

Algorithm Comparison.
We compare different algorithms with the rumor recognition model proposed in this paper to illustrate the accuracy and generalization ability of the model we proposed.e results are shown in Table 5.
As shown in

Empirical Analysis.
In order to verify the practicability of the Weibo rumor recognition method proposed in this paper, we use three events for experimenting (see Table 2), and the results are shown in Figure 7.  Figure 7 shows that, in different events, the performance of each model is a little different.RF has 98% recall and 77% accuracy, which indicates that the RF model may recognize most nonrumor Weibo as rumor.Although the Stacking model has 93% recall and 80% accuracy, it is higher than other models.erefore, the model proposed in this paper is relatively effective in actual recognition.
Besides, in order to verify the effectiveness of the new features proposed in this paper, the comparative experiments are conducted on the feature set with UCRE, ECON, and RCO, and the feature set excludes them.e results are shown in Figure 8.
As illustrated in Figure 8, evaluating on the accuracy, precision, recall, and F1-score, the Stacking model implemented on the feature set with UCRE, ECON, and RCO performs better.In conclusion, the new features proposed in this paper are suitable for actual rumor recognition.

Conclusion
In this paper, based on previous studies, three new features are constructed by combining the 5W's Formula in communication; and we obtain the optimal feature set of Weibo rumor recognition through the Chi-square test and other methods.en, based on the idea of the Stacking method, we select SVM, Random Forest, and Naive Bayes as the primary learner and logistic regression as the metalearner to model and analyse the Weibo rumor recognition.e experimental results show that the features constructed in this paper and the proposed recognition method can effectively detect rumors in Weibo.
However, there are still some shortcomings in this paper.For example, in the case of 2019-nCov, while the content information is more complex and the amount of information is larger, the performance of the algorithm proposed in this paper is worse than that of the other two events.erefore, it is necessary to carry out deeper semantic mining of microblog text content; for example, the content and emotion of the text using phonetic transcription abbreviations need to be specially processed.In addition, designing a more suitable and effective integration strategy of classification algorithms for Weibo rumor recognition is one of the future works.Besides, detecting rumors with Weibo text information is worthy of further study.

2
Discrete Dynamics in Nature and Society pass to the audience.

Figure 3 :
Figure 3: Based on the new features.

Figure 4 :
Figure 4: According to the features ranking.

Table 1 :
Features ranking results.

Table 5 ,
compared with tanh-RNN, SVM, and Decision Tree J48, the Stacking model has the highest accuracy rate of 93.5%.e Stacking model can recognize rumor with 96.5% recall rate and 91.4% precision, which shows that the model can recognize more rumors; and the Stacking model can recognize nonrumor events with 93%

Table 2 :
Empirical study data.Weibo set W m 1 , m 2 , m 3 , . . ., m n   Output: the confidence value y i of m i Step 1: extract features of m i as shown in Figure 5 Step 2: calculate user credibility of m i Step 3: segment m i and its comments, and calculate emotion consistency of m i Step 4: calculate regional correlation of m i Step 5: standardize each feature of m i Step 6: split the preprocessed data set into train set, test set and validation set, and input them into SVM, RF, and Naïve Bayes model Step 7: input the new feature set in step 6 into logistic regression model Step 8: calculate the accuracy, precision, recall and F1-score of the Stacking model ALGORITHM 1: Weibo rumor recognition method based on stacking.

Table 3 :
Feature set comparison results.

Table 4 :
e effectiveness of each feature in F SIFT .We calculate the training time and test time of each algorithm separately, and the results are shown in Table6.Table6shows that the Naive Bayes algorithm takes the shortest time in training, but the Stacking model proposed in this paper takes the longest time in training because it ensembled multiple algorithms.In the testing phase, the Stacking model takes the shortest time, which only accounts for 7.8% of Naive Bayes, and the second shortest logistic regression model is 2.7 times longer than the Stacking model.