An Adaptive Deep Transfer Learning Model for Rumor Detection without Sufficient Identified Rumors

With the extensive usage of social media platforms, spam information, especially rumors, has become a serious problem of social network platforms. ,e rumors make it difficult for people to get credible information from Internet and cause social panic. Existing detection methods always rely on a large amount of training data. However, the number of the identified rumors is always insufficient for developing a stable detection model. To handle this problem, we proposed a deep transfer model to achieve accurate rumor detection in social media platforms. In detail, an adaptive parameter tuning method is proposed to solve the negative transferring problem in the parameter transferring process. Experiments based on real-world datasets demonstrate that the proposed model achieves more accurate rumor detection and significantly outperforms state-of-the-art rumor detection models.


Introduction
With the rapid development of mobile Internet technology, online social networking (OSN), a novel information publishing and sharing platform, has become an essential part of our daily life. Some OSN platforms, such as Facebook, Twitter, Weibo, WeChat, and other social networking platforms, have triggered a media revolution with the interactivity, immediacy, and diversity, which have profoundly affected all aspects of our society and economy. e existence of false information makes it difficult for OSN users to obtain credible information on OSN platforms. Rumors are the most common false information, which are false messages that spread among a large amount of people and have mislead these people [1]. Due to easy access to social media, rumors can spread extensively on social media, bringing huge harm to society and causing a lot of economic losses. For example, if there is a rumor about a bomb event in a hotel, the income of the hotel will suffer from the propagation of this rumor on social media platforms. Even worse, malicious rumors may seriously violate the opinions of OSN users, cause social panic, and even lead to a crisis of confidence. e rumors on OSN have become a serious social problem. However, it is unrealistic to rely on manual methods to identify and filter rumors, and the average accuracy of three human judges is only 57.33% [2]. erefore, effective detection of rumors in OSN platforms is highly desired. e research studies on automatic detection of rumors have received increasing attention.
Most existing rumor detection methods employed learning algorithms that incorporated a wide variety of features to take rumor detection as a binary classification task [3]. ese rumor detection models incorporated a wide variety of features of the text content [4], user characteristics [5], and diffusion patterns of the OSN messages [6] or simply exploited the patterns used in regular message to discover rumors [7]. ese approaches aim at extracting distinctive features to describe rumors faithfully. However, these traditional machine learning methods fail to obtain an effective classification model when the features are sparse. Ma et al. [8] proposed an adaptive model and for the first time used the recurrent neural networks to achieve microblog rumor detection. Recurrent neural network (RNN) models achieve significant improvements over state-of-the-art learning algorithms that rely on hand-crafted features. Chen et al. [3] introduced CallAtRumors, a novel recurrent neural network model based on soft attention mechanism to automatically carry out early rumor detection by learning latent representations from the sequential messages in OSN. Singh et al. [9] used a convolutional neural network (CNN) model to mine the semantic features of review texts, which was then used to identify false reviews. However, these neural network-based models require a large amount of training data, and the size of training datasets affects the accuracy of the model when the training data are insufficient.
Transfer learning (TL) is a branch of machine learning (ML) algorithms, which leverages the knowledge stored within a source domain and provides a method to transfer the knowledge of the source domain to a target domain [10]. At the same time, transfer learning has benefited many realworld applications where labelled data are abundant in source domains but scarce in the target domain [11]. Existing studies have already provided the evidences for applying TL on neural features. For the instance of image processing, the deep neural networks exhibit an interesting phenomenon that the model always tends to learn first-layer features that resemble either Gabor filters or color blobs [12,13]. Donahue et al. [14] suggested that high-level layers are also transferable in general visual recognition. Mou et al. [15] further studied the transferability of neural layers. Semwal et al. [10] reported the results and conclusions obtained from extensive empirical experiments using a CNN and tried to uncover primary rules to ensure a meaningful transfer operation. e application of transfer learning in text classification provides new ideas for solving the problems when labelled texts are insufficient to support the training processes of models. However, without the knowledge about the difference between the source and the target domains, negative transferring [16] occurs when knowledge is transferred from different domains. Negative transferring refers to the phenomenon that instead of improving the classification accuracy of the models, transfer learning from other domains degrades the classification accuracy on the target domain. Despite the fact that how to avoid negative transferring is a very important issue, little research has been done on this research field. e deep neural network incorporates the domain knowledge into the parameters of their nodes during the training process. We can transfer the related knowledge embedded in neural networks to the rumor detection domain by reusing the parameters of the neural networks. is paper proposes a deep transfer model based on CNN to approach an accurate rumor detection scheme. In detail, we propose a learning rate adaptive update method to solve the negative transferring problem in the transfer process.
e main contributions are listed as follows: (1) A novel deep transfer model based on CNN for rumor detection is proposed, which can effectively identify rumors without sufficient training data. We evaluate that the knowledge related to large-scale datasets in the field of e-commerce reviews has similar features with the knowledge about the characteristics of rumors, which is used to train a model whose parameters is transferred to the rumor detection model. (2) We propose a learning rate adaptive update method to solve the negative transfer problem during the parameter transfer process. In detail, based on the stochastic gradient descent algorithm, we achieve an adaptive learning rate updating method for finetuning the rumor detection model obtained in the transfer process. (3) We implement the proposed detection scheme on an open source deep learning platform, TensorFlow [17]. e experiments based on real-world datasets demonstrate that under the interference of the common expressions frequently appearing both in rumors and regular messages, the proposed scheme achieves more accurate rumor detection compared with the existing rumor detection approaches. e rest of the paper is organized as follows. In Section 2, we analyze the related work. Section 3 gives details of our proposed rumor detection model, and Section 4 provides the performance evaluation based on TensorFlow. Section 5 concludes our paper.

Related Work
In this section, we focus on providing a brief review of the work most closely related to effective and efficient rumor detection. We outline related research approaches in three fields: rumor detection, deep learning, and transfer learning.

Rumor Detection.
Rumor is a powerful, pervasive, and persistent force that misleads people and groups [1]. Rumor detection has been a popular research topic in recent years. Rumor is a research subject in psychology and social cognition for a long time [18]. It is often viewed as an unverified account or explanation of events circulating from person to person and pertaining to an object, event, or issue in public concern [19]. e challenges of rumor detection, such as the veracity and accuracy of rumor with small data size, are discussed in [20][21][22]. Derczynski et al. [20] used journalism use case dataset [23] to accomplish support/rumor stance classification and veracity prediction. Reference [23] contains 330 conversational threads (297 in English for 8 events in total, and 33 in German), which includes 4,222 reply tweets. Ma et al. [21] created tweet dataset and microblog dataset by crawlers and other ways. For the Twitter data, they contain 498 rumors and 494 nonrumors. For Weibo data, they collected nonrumor and rumor data in Chinese. However, these datasets contains small-size data. In 2019, interest in automated claim validation has greatly increased. Gorrell et al. [22] extended the dataset compared with [23] in that the dataset is substantially expanded to include Reddit as well as Twitter data, and additional languages are also included. In addition, we consider fake news [24,25], and there are no agreed upon benchmark datasets for the fake news detection problem. Datasets mentioned in [26] cannot provide all possible features of interest, and these datasets also have specific limitation that make them challenging to use for fake news detection. BuzzFeedNews only contains headlines and text for each news piece and covers news articles from very few news agencies. LIAR includes mostly short statements, rather than the entire news content. BS Detector data are collected and annotated by using a developed news veracity checking tool, and the labels have not been properly validated by human experts. e tweets in CREDBANK are not really the social engagements for specific news articles. To address the disadvantages of above fake news detection datasets, Shu et al. [26] have an ongoing project to develop a usable dataset, called FakeNewsNet, for fake news detection on social media. It includes all mentioned news content and social context features with reliable ground truth fake news labels. However, the free data site is no longer available or the original fake news is not public. erefore, the dataset available for rumor detection is insufficient.
Early exploration started from two special studies on rumor propagation during natural disasters like earthquakes and hurricanes [19,27]. Castillo et al. [28] selected four types of features, namely, message-based features, user-based features, topic-based features, and propagation patternbased features, and then used the J48 decision tree to detect rumors in Twitter. Zhang et al. [29] considered heterogeneous network and analyzed the structure of the information diffusion graph of mobile social network (MSN) to learn the latent factors of each piece of information and proposed a diffusion model to explain the spread of information in MSN. In [30], the major difference between rumors and nonrumors was discussed. e existing rumor detection methods mainly include the rumor detection methods based on traditional machine learning [4,5,7] and the more accurate rumor detection methods based on neural network models [3,6,8,31]. Tian et al. [32] proposed to learn user attitude distribution for Twitter posts from their comments and then combined it with content analysis for early detection of rumors based on huge models when data for information sources or propagation are scarce. However, these existing rumor detection methods rely on a large amount of labelled training data or huge models for modeling. e number of the labelled rumors is always insufficient to train any of the existing neural network-based detection models to cover the characteristics of various rumors and cannot accurately detect rumors.

Deep Learning.
Deep learning models simulate the human brain's thinking patterns to discover various characteristics of texts. erefore, the accuracy of deep learning models is often higher than that of the traditional rumor detection technology. Recently, deep neural networks are emerging as the prevailing technical solution to almost all fields in natural language processing (NLP). Word embedding is the basis for deep learning to solve many natural language processing problems [33]. Liu et al. [31] used a CNN model to abstract textual and temporal information in social media and exploited postlevel textual information to generate group embedding for further analysis. Kim [34] applied CNN in text categorization. Experiments have shown that the CNN text categorization model can obtain higher accuracy than other machine learning models. However, these deep learning models are not easy to converge due to the huge number of their parameters.

Transfer Learning.
Transfer Learning can alleviate the lack of labelled training data for training a deep learning model [10]. As illustrated in Figure 1, if the training data are insufficient, the characteristics of the features in the source domain cannot be identified by the deep layers, and transfer learning can be used to improve the accuracy of models in a domain by transferring knowledge from the related domains [10]. In [15], the evidence has been discovered, which shows that TL in NLP applications is more sensitive to the text semantics. While TL has produced positive results within the domain of image processing, its usage in NLP applications still remains a fairly unexplored research area. Yang and Zhang [33] proposed a transfer learning algorithm called automatic transfer learning (AutoTL) for short text mining. Johnson and Zhang [35] accomplished a semisupervised framework to improve the text classification accuracy by integrating knowledge from word vectors learned on unlabelled data. Do and Gaspers [36] achieved a considerable improvement in the accuracy of a language understanding task by initializing the parameters with an additional unlabelled dataset.
In view of the excellent performance of transfer learning for constructing a deep learning model without sufficient training data, this paper proposes a scheme for rumor detection based on transfer learning in the next section.

Rumor Detection Model Based on Deep Transfer Learning
After the deep learning model completes its training process, the domain knowledge will be fixed into the model parameters. When the training data are insufficient, an effective training model cannot be obtained, as shown in Figure 2. In view of this essential characteristic, we propose a rumor detection scheme based on parameter transferring. In this section, we propose a deep transfer model, namely, TL-CNN. It achieves accurate rumor detection by using review evaluation knowledge in the e-commerce domain. In detail, based on the stochastic gradient descent algorithm, we propose an adaptive learning rate updating method for fine-tuning of the model obtained in the transfer process. e overall framework of the model is illustrated in Figure 3. A basic detection model has the same structure as the model used in the rumor detection process. Firstly, the basic detection model transfers its model parameters obtained in the training process on the polarity review data to the rumor detection model. e basic convolutional neural network-based detection model is proposed in Section 3.1, which is illustrated in Figure 4. In addition, we adapt the Mathematical Problems in Engineering parameters of the basic detection model to the rumor detection process by performing a fine-tuning operation in Section 3.1. In this way, we can obtain an effective rumor detection model based on transfer deep learning.

Basic Detection Model.
e convolutional neural network (CNN) model is originally proposed in computer vision and is proven to be effective in natural language processing, semantic analysis, and other traditional NLP e data are sufficient tasks [34]. It has been extensively applied in text classification. As a common detection model, a basic CNN-based detection model is presented in this section. It is feasible to collect review texts with polarity data on e-commerce platforms, and the polarity data provide a guideline for us to evaluate whether the corresponding texts are rational. erefore, we can use review texts with polarity data to train the basic detection model, and the parameters of this model can be transferred to the rumor detection model to handle the problem of training data insufficiency. e basic detection model consists of five components: an embedded layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. Among them, the convolutional layer, the pooling layer, and the fully connected layer are used to collect and mine features in the training data, the parameters of which can be used in transfer learning process in order to construct an accurate rumor detection model. e basic detection model is illustrated in Figure 4, and the configuration of this model is depicted in Table 1.

Embedding Layer.
e embedding layer is the first layer of the basic detection model, which is used to preprocess the raw data. In detail, the embedding layer formulates the original input data as a matrix. For a task of text classification, a sentence will be represented with a vector of the identifiers of words, which is named as a word vector. All the input data of the model consist of a n × m matrix, namely, input matrix, where n is the number of sentences and m is the dimension of the word vector. e process of text preprocessing can be formulated with formula (1), where X represents the input matrix.

Convolutional Layer.
e convolutional layer is the elemental layer of a convolutional neural network. e convolutional layer consists of several convolutional units, and the parameters of each convolutional unit are optimized by a backpropagation algorithm. Each convolutional unit can cover a part of the input matrix. e difference between our convolutional layer and the existing convolutional layer is illustrated in Figure 5. It is worth nothing that the existing convolutional neural network model traverses all types of convolutional units for mining the features of the inputted texts ( Figure 5(a)). In our convolutional layer ( Figure 5(b)), since each line of the input matrix represents a sentence of a text, the width of convolutional units is configured according to the width of the input matrix. e purpose of the convolution operation is to extract different local features of the inputted sentences. For obtaining the processing result of each convolution unit, we use x i,j to represent the word at row i and column j, use w m,n to denote the weight of the word at row m and column n, and use b to denote the bias of this convolutional unit. Each convolution result for different convolution units consists a matrix, namely, feature map. e element at row i and column j of the feature map, a i,j , is obtained with an activation function f.   Ultimately, the corresponding eigenvalue of the convolution unit as well as the corresponding eigenvector is obtained. In detail, we use formula (2) to calculate the convolution units, where the results for the convolution units of size 3, 4, and 5 are indicated by a (3) i,j , a (4) i,j , and a (5) i,j , respectively.
e feature values obtained on the convolution units of a specific size comprise the corresponding feature matrix.

Pooling Layer.
To remove trivial eigenvalues from the feature map, a max pooling-based layer is used to reduce the number of features, which simplifies the computational complexity of CNN and reduces the overfitting rate. Max pooling operation is the most popular pooling operation, which will take the maximum values in the feature map after performing a dot product on the weight matrix and the feature map. e weight matrix is valuable for obtaining the most important features included in the feature map. is paper uses the maximum pooling operation to process the results of convolution layer and obtains a brief semantic representation of the inputted texts, A, as shown in the following formula: (4) i,j , a (5) � max a (5) i,j , · · · .

Fully Connected
Layer. e fully connected layer is a regular hidden layer of a multilayer neural network which makes higher order decisions. It receives inputs from the pooling layer. To avoid suffering from overfitting, a Softmaxbased fully connected layer is included in the basic detection model. Softmax operation randomly discards some inputs from the pooling layer.
e Softmax operation pooling obtains the most important features that contribute to the classification process, connecting the overall features of the inputted texts.

Output Layer.
e output layer is responsible for obtaining the final detection result. e weight matrix of this layer is highly related to the characteristics of the detection targets and is invaluable for transfer learning. e output layer is defined as follows: Normalize y using the Softmax function to get the probability whether a text D belongs to a specific category.
where g is equal to 0 or 1, y 0 means D does not belong to this category, and y 1 means D belongs to this category.

Rumor Detection Model Based on Parameter
Transferring. e transfer learning algorithms reuse the existing knowledge to the target domain in order to solve the problem of training data (i.e., labelled data) shortage. e hierarchical architecture of the deep neural network model is very suitable for transfer learning [15]. We use the parameter transfer method to solve the problem of training data shortage in the rumor detection domain. In detail, we construct an original rumor detection model by reusing the parameters of the aforementioned basic detection model (Figure 6(a)). In order to avoid the negative transferring under limited amount of labelled rumors, we further finetune the original rumor detection model with a layerwise scheme (Figure 6(c)).

Original Rumor Detection Model Based on Parameter
Transferring. As illustrated in Figure 6, we initialize an original rumor detection model (TL-CNN-Strawman) by reusing the parameters of the basic detection model, where the rumor detection model has the same structure as the basic detection model proposed in Section 3.1. e accuracy rates of the basic detection model and TL-CNN-Strawman are listed in Table 2, where accuracy rate is equal to the ratio of the messages correctly classified to the total number of messages. In a straightforward way, the basic detection model is trained on a review dataset YELP-2, and TL-CNN-Strawman is initialized by reusing the parameters of the basic detection model. After cloning the parameters of the basic detection model, TL-CNN-Strawman is additionally trained on a small rumor dataset (FBN).
As depicted in Table 2, the accuracy of the basic detection model on a review dataset YELP-2 is 89.71%, which demonstrates that the basic detection model can effectively detect the reviews with different polarities. However, negative transferring occurs when the parameters of the basic detection model are transferred to TL-CNN-Strawman, the accuracy of which is only 67.85%, since the labelled rumors of FBN are insufficient for fine-tuning TL-CNN-Strawman. Without an effective fine-tuning method, the rumor detection accuracy of TL-CNN-Strawman is lower. Negative transferring occurred during reusing the parameters of the basic detection model in TL-CNN-Strawman.
In order to solve this problem, we need to fine-tune the hyperparameters in the model training process to avoid negative transferring, instead of reusing the parameters of the basic detection model in a straightforward way. As depicted in Figure 6(b), the parameters of TL-CNN-Strawman could be left unstable if the labelled data are insufficient for fine-tuning the model. In Figure 6(c), the parameters of a frozen layer will be skipped during the learning process, and we will focus on training other parameters. Since less parameters will be learned, the finetuning process will converge in a short time. By applying this layerwise fine-tuning mechanism in the training process of our rumor detection model, we can tune different layers more efficiently and handle the negative transferring problem.

Adaptive Layerwise Fine-Tuning of Learning Rate.
To achieve an effective layerwise tuning scheme, we analyze the effect of the hyperparameters in the training process. Among them, the learning rate is the most important hyperparameter that is related to the efficiency of the model training process and the accuracy of the model training results. With an applicable learning rate, we can obtain an accurate model as soon as possible. If the learning rate is too high, the model will miss the optimal point and need multiple iterations to reach convergence. On the other hand, a low learning rate always is related to a longer training process and causes the model to fall into a local optimal point. Different layers in the neural network can acquire different types of features, and thus the parameters of these layers should be tuned with different learning rates. With the limited amount of training data in rumor detection domain, we apply discriminative fine-tuning to configure each layer with different learning rates, instead of using the same learning rate for all layers of the rumor detection model.
To discover the optimal learning rate for each layer, we propose a learning rate updating scheme to update the learning rate in a reasonable way. In detail, stochastic gradient descent (SGD) [37] is applied to adapt the learning rate to the training process of a specific layer l. In this way, loss function L(w) can reduce more rapidly, and the layerwise training can converge only with a limited number of labelled rumors. e adaptive learning rate updating rule is as follows: (1) Updating learning rate μ for the t-th iterate.
where ] t is the moving average of uncentered variance over past first-order gradient of the loss function ▽L(w t− 1 ; x t− 1 ), β is the decay rate for computing ] t , w t− 1 is the parameter vector at time t, and x t− 1 is the input from the last layer in the t − 1-th literate.
where μ t is the learning rate in the t-th literate and ε is the small hyperparameter for obtaining the stable convergence. e learning rate, μ t , is divided by magnitude �� ] t √ of the past first-order gradient of the loss function. Intuitively, if the parameter vector has large value of ▽L(w t− 1 ; x t− 1 ) in terms of the magnitude in the past, the next literate yields a small learning rate because �� � ] i,t √ in equation (8) is large.
(2) Updating the weight of this layer l.
e detailed updating process is introduced in Algorithm 1. We first initialize the number of Batch_size b, decay rate β, weight w, gradient ], learning rate μ, and hyperparameter ε. e loop from line 2 to line 9 is Mathematical Problems in Engineering 7 performed repeatedly before convergence. In line 3, the output of the last layer, x � x 1 , x 2 , . . . , x b , is obtained. In line 4, the loss function for this layer, L i (w) � L i (y i , f(x i , w)), is calculated. In line 5, the moving average of uncentered variance over past first-order gradient, ▽L i (w i,t− 1 ; x t− 1 ), is obtained. As the iteration process continues, the value of the loss function continuously decreases, and the learning rate μ is updated in line 6. Additionally, we update the weights according to the gradient in lines 7 and 8. By applying the aforementioned fine-tuning method into the original rumor detection model (TL-CNN-Strawman), an effective rumor detection model (TL-CNN) is obtained.

Experiments
In order to evaluate the proposed rumor detection scheme, we implemented the proposed rumor detection scheme and baseline schemes on TensorFlow. TensorFlow is a machine learning system that operates at large scale and heterogeneous environments. It is the second generation of artificial intelligence learning system developed by Google. On the real-world datasets, we comprehensively compare the proposed scheme and the baseline schemes in terms of different accuracy metrics.

Yelp Polarity (YELP-2).
e Yelp review dataset was obtained from the 2015 Yelp dataset challenge [38]. e dataset contains 1,569,264 samples with review texts. Two tasks are performed on this dataset. e first one predicts the total number of stars given by the user, and the other predicts the polarity label by considering stars, 1 and 2 being negative and 3, 4, and 5 being positive. is dataset includes 130,000 training samples and 10,000 test samples for each state of ranking and has 280,000 training samples and 19,000 test samples for each polar state.

Five Breaking News (FBN).
e FBN dataset is a small rumor dataset that is about all five events and includes 5,802 labelled tweets [39]. e five events are Ferguson unrest, Ottawa shooting, Sydney siege, Charlie Hebdo shooting, and Germanwings plane crash. ese two datasets are used, respectively, in the basic detection model and the rumor detection model obtained in the transfer learning process, which is named as D s and D t . e detailed information of these two datasets is listed in Table 3.

Implementation.
We provide the relevant parameters used in the proposed model in Table 1. e size of convolutional units is configured to be 3, 4, and 5. Each convolutional unit relates to a feature map. e dropout rate of the fully connected layer is 0.5. e L2 regular term with a coefficient 3 is used in the Softmax. During the training process, the batch size is configured as 50.
or or or or (c) Figure 6: Transfer learning process. We initially train the basic detection model on YELP-2 dataset. After the transfer learning process, we fine-tune the obtained rumor detection model on FBN dataset. To extensively evaluate the performance of the rumor detection model, we implement the proposed scheme (TL-CNN) on TensorFlow as well as three state-of-the-art baseline schemes. e detailed information of these three baseline schemes is introduced as follows: (i) VDCNN-based model: VDCNN is a convolutional neural network-based model proposed by Gereme and Zhu [40]. As the depth of the model increases, the accuracy of the solution can also increase. (ii) Char-CNN-based model: Based on a character-level convolutional network, Char-CNN was proposed by Joo and Hwang [41] to perform classification tasks, such as the Yelp polarity dataset and the Amazon review dataset. (iii) RCNN-based model: RCNN is a model proposed by Fang et al. [42], which essentially incorporates RNN and CNN into text categorization tasks. First, it applies a RNN model to capture context information as much as possible while learning word representation. To capture key features of the text, this model additionally uses the maximum pooling layer to automatically determine which words play a key role in text categorization.

Evaluation Metrics.
We use accuracy rate, precision rate, recall rate, F1-measure and accuracy gain to evaluate the effectiveness of the proposed scheme. Accuracy rate is the ratio of the messages correctly classified as rumors to the total number of messages.
where TP, FP, FN, and TN are the abbreviations of true positives, false positives, false negatives, and true negatives, respectively. Precision rate is calculated as the ratio of all messages correctly classified as rumors (TP) to all messages classified as rumors (TP + FP).
Recall is the ratio of all messages correctly classified as rumors (TP) to all messages that should be classified as rumors (TP + FN).
F1-measure is the harmonic mean of precision and recall.
Accuracy gain (α) is calculated as the ratio of A e to A c , which is used to evaluate the accuracy increment of the proposed scheme relative to baselines.
where A e represents the accuracy rate of our model and A c represents the accuracy rate of baselines.

Require:
Batch_size b; decay rate β; weight parameter w; gradient parameter ]; Learning rate μ; Hyperparameter ε; (1) initialize parameter w, ], μ; (2) while no convergence do (3) Obtain x � x 1 , x 2 , . . . , x b from the last layer. (4) Calculate the loss function L(w t− 1 ; x t− 1 ); (5) Calculate the moving average of uncentered variance over past first-order gradient of the loss function Calculate the past first-order gradient of weights: Update weights of this layer: w t � w t− 1 + Δw t− 1 (9) end while ALGORITHM 1: Adaptive learning rate update algorithm. In Figure 8, we use accuracy gains to evaluate the improvement of accuracy rate compared with baselines. When the number of epochs is greater than 12, the accuracy gain values of TL-CNN with respect to Char-CNN and RCNN tend to be stable. From then on, although the accuracy gain value of TL-CNN relative to VDCNN fluctuates within a large range, all accuracy gains are higher than 1.10. erefore, TL-CNN improves the accuracy of rumor detection.
A detailed result is listed in Table 4. e accuracy rate of VDCNN is only 71.88%. RCNN achieves a better result with an accuracy rate of 79.53%. e accuracy rate of Char-CNN is higher compared with the other two baselines, which is equal to 83.21%. Our model TL-CNN achieves the best results with an accuracy rate of 87.28%, which is 4% higher than Char-CNN. Similarly, the precision of VDCNN, RCNN, and Char-CNN is 60.59%, 76.92%, and 78.80%, respectively, and the precision rate of our model is 79.12%. In terms of recall rate, Char-CNN achieves the best results with a recall rate of 85.47%. e recall rate of our model is higher than the recall rates of both VDCNN and RCNN; however, it is lower than the recall rate of Char-CNN. e reason for this is that our model considers the lower false positives as the most important guiding principle during the rumor detection process. As a result, our model's recall rate is slightly lower than that of Char-CNN. TL-CNN achieves the best results in the F1-measure, which is the comprehensive metric for accuracy evaluation. e F1 value of our model is 0.825, while the F1 values of VDCNN, RCNN, and Char-CNN are 0.597, 0.799, and 0.819, respectively. e proposed model was trained within 529.57 minutes on Yelp and FBN. It trains 6.6963 data per second and tests 2.4361 data per second in average. A detailed result is listed in Table 5. Although the training time of TL-CNN is longer than that of the baselines, it achieves more accurate detection results within similar testing time, even faster than VDCNN and RCNN.

Conclusion
In this paper, we present an effective deep transfer model based on convolutional neural network, TL-CNN, to detect rumors with limited amount of training data. To achieve that, considering the phenomenon of negative transferring    during the transfer learning process, we propose a learning rate adaptive tuning method to avoid negative transferring. e extensive experiments on the real-world datasets demonstrate that the proposed rumor detection model can significantly improve the accuracy of rumor detection, which can be applied to social media, e-commerce, and other fields.

Data Availability
Previously reported text data, Yelp and FBN, were used to support this study and are available at WOS: 000450913101042 and ArXiv:1610.07363v1. ese prior studies (and datasets) are cited at relevant places within this paper as references [38,39]. Among them, Yelp review dataset contains 1,569,264 items with review texts. e other dataset, FBN dataset, is a rumor dataset, which contains 5,802 labelled tweet messages, including five events, Ferguson unrest, Ottawa shooting, Sydney siege, Charlie Hebdo shooting, and Germanwings plane crash.

Conflicts of Interest
e authors declare that they have no conflicts of interest.