Fake News Detection on Social Media: A Temporal-Based Approach

: Following the development of communication techniques and smart devices, the era of Artificial Intelligence (AI) and big data has arrived. The increased connectivity, referred to as hyper-connectivity, has led to the development of smart cities. People in these smart cities can access numerous online contents and are always connected. These developments, however, also lead to a lack of standardization and consistency in the propagation of information throughout communities due to the consumption of information through social media channels. Information cannot often be verified, which can confuse the users. The increasing influence of social media has thus led to the emergence and increasing prevalence of fake news. In this study, we propose a methodology to classify and identify fake news emanating from social channels. We collected content from Twitter to detect fake news and statistically verified that the temporal propagation pattern of quote retweets is effective for the classification of fake news. To verify this, we trained the temporal propagation pattern to a two-phases deep learning model based on convolutional neural networks and long short-term memory. The fake news classifier demonstrates the ability for its early detection. Moreover, it was verified that the temporal propagation pattern was the most influential feature compared to other feature groups discussed in this paper.

Social media is now the preferred avenue for common people to exchange information and share images, videos, etc. [9][10][11][12][13][14][15]. The speed and reach of the propagation of information through social media often overwhelms traditional news media. For instance, real-time information on unexpected accidents (e.g., natural disasters) is usually captured first on social media [16][17][18][19][20]. Flanagin et al. [21] show that social media is a highly influential news media for modern people. Posts by popular and influential personalities (i.e., influencers described in Section 1.2) are usually considered highly reliable [22].
Unfortunately, not all information on social media is real news. Social media generates a massive amount of information and users are exposed to it continuously and unconditionally. Information on social media can be unreliable, and it is difficult to judge its authenticity [23]. Most information on social media is shared to make news headlines [24]. Thus, there is a high likelihood of social media being abused politically and economically. Lately, false information shared in the garb of actual news has been confusing and misleading many users. In the past, false information was slowly propagated, and its range of influence was local. However, in today's hyper-connected society, individuals are connected across geographical boundaries. Because the speed of propagation is also very fast, it is important to identify false information early [25,26].
Many social media providers have made efforts to identify false information. However, most of them rely on user reports or experiments [27]. Even to use fact-checking websites such as Snopes, FactCheck, PolitiFact, etc., extensive manual effort is required. Thus, it has now become essential to provide automated content reliability evaluation services to social media users [28,29].

The Powerful Spreaders of False Information
Globally, Twitter and Facebook are the most popular social media platforms. People are attracted to Twitter because they can communicate informally with popular people such as movie actors or sports players. Twitter provides a subscription-like function, the follow button. Twitter users can obtain information from other users and can check their awareness, reputation, and influence based on the number of followers. People with many followers, called influencers, have a strong impact on their field of expertise. Because a post by the influencers is shared with their followers, the propagation power of information is stronger than that of common users. Twitter users share their opinions with followers by posting tweets, including texts, pictures, and videos. The followers express their interest in the tweet using buttons such as reply, like, and retweet. The retweet function is the most influential function because, on retweeting, a person's follower shares the tweet with his or her followers. This can translate into a long cycle, which, in turn, popularizes the original tweet manifold.
Retweets may have a significant influence in certain fields because of their strong propagation power. However, retweets also can cause social upheavals. Retweets are highly likely to be abused by individuals or groups with malicious purposes. During the 2016 US presidential election, fake news spread indiscriminately through retweets, confusing many voters [30]. Moreover, because retweets exhibit a powerful real-time propagation of information, they have a strong influence even in emergencies such as natural disasters. During the 2010 earthquake in Chile, many rumors spread through retweets immediately after the accident, which aggravated confusion and anxiety among the locals [31].

Contribution
In 2015, the function of the retweet was updated, and it was called a quote retweet by some researchers [32][33][34][35][36]. Because the original retweet did not contain the user's opinion, it was possible only to express reactions such as interest and agreement. Therefore, the users do not retweet if they are not interested in a particular tweet or if there is a feeling of rejection. In contrast, a quote retweet can include user's opinions, unlike the original retweet. Thus, users can express their opinions on the quoted tweet as positive, negative, or neutral. According to Garimella et al. [32] study, the number of cases where users use quote retweets instead of retweets is increasing. In addition, because a quote retweet not only spreads information but also includes the user's official stance, it can provide a new pattern of spreading information. Therefore, quote retweets can potentially be more effective than original retweets in detecting false information. In this study, we analyze this new pattern of information diffusion using quote retweets and propose an effective methodology to detect fake news. This study introduces a methodology for detecting fake news using temporal propagation patterns of quote retweets to protect people who are exposed to false information as they are bombarded with online information in the hyper-connected world of smart cities. First, we collect news from Kaggle and analyze Twitter content that mentions each news item. Then, the temporal news propagation pattern based on quote retweets is extracted from the collected content and analyzed using visualization and statistics. We also identify the features of the content that spread fake news and classify them into four groups. We processed the time-series dataset for learning by using the temporal information of the identified features. Finally, to verify our methodology, we created a fake news classifier using convolutional neural network (CNN) and long short-term memory (LSTM)-based two-phase deep learning to evaluate its performance. The summary of the contributions of this study is as follows: • To detect fake news more efficiently, we introduce a methodology for detecting fake news based on the temporal propagation pattern of quote retweets. • We identify and define new features of fake news using visualization and statistics from the temporal propagation patterns of quote retweets. • We define a time-series dataset to train a deep learning-based fake news classifier that combines CNN and LSTM. • We verify the effectiveness of our methodology by comparing it with existing content-based fake news detection techniques.
The rest of this paper is organized as follows: Section 2 introduces related works; the propagation graph-based fake news detection method is presented in Section 3; we evaluate the performance of the proposed method in Section 4; and finally, Section 5 concludes the paper.

Content Feature-Based Reliability Verification
To identify fake news or rumors, many researchers have analyzed the features of content that spread fake news from Twitter. Studies on content feature-based reliability verification use the information posted on Twitter to spread the news. Features used in content-based systems are mainly defined as basic, linguistic, user-based, and propagation and network-based groups [33,34].
The basic group includes intuitive information such as hashtags (#) and mansions (@) in tweets, retweets, likes, and replies, and can be collected using Selenium or Tweepy application programming interfaces (APIs). The linguistic group includes information such as tone of speech, positive or negative, extracted from the text in the tweet. To extract linguistic information from text, additional processing using natural language processing (NLP) technique is necessary. The user-based group contains information on the influence of the user who wrote the tweet, such as the number of followers, who they are following, and the number of tweets. The propagation and network-based groups contain information such as temporal propagation features and propagation depth. Propagation and network-based features are quite complicated and it is difficult to collect and process data to trace the network between users [34].
Castillo et al. [33] defined message-based, user-based, topic-based, and propagation-based groups to verify the reliability of online content. In addition, the differences between rumors and non-rumors were analyzed by extracting group-specific features from the tweets. Then, the extracted features were machine-learned to verify the performance of the rumor classifier. Kwon et al. [34,35] identified rumors and found a phenomenon in which rumors were spread for a short time in a low-density network. User-based, linguistic, temporal, and network-based feature groups were defined, and through rumor classification experiments, it was observed that temporal groups have a great influence on rumor classification. Yang et al. investigated rumors from the Sina Weibo platform. They argued that the type of client that created the content and the platform on which the content is uploaded are the key features of rumor detection. They further defined the client-based features and the location-based features from the content obtained from Sina Weibo for further evaluation. The experiment showed that the rumor classifier with learned clientbased features and location-based features slightly (5%) improved the performance compared to classifiers in which they were not trained. Jang et al. analyzed the features of quote retweets that spread news using various visualizations and machine learning and showed that the quote retweets were effective in propagating fake news [36].

User Stance-Based Reliability Verification
Social media tend to adopt a stance on the accessed information. A user's stance is generally expressed as agreement, disagreement, neutrality, and others. Many researchers have expressed it in various forms. Maddock et al. [37] defined the stance of a user who heard a specific rumor as misreport, guess, modification, question, neutral, disagree, and others. Procter et al. [38] classified user stances as agree, disagree, objection, and comment. Zubiaga et al. [39] divided it into support, comment, and mock. Mendoza et al. [40] highlighted that there is a strong correlation between user stance and reliability and found that many users refute a rumor uncovered as fake with negative opinions. A study by Jin et al. [41] showed that users hearing the news that seems important immediately portray their stance and the stance is effective in detecting fake news. He mentions that there is a high possibility that users who come across a tweet doubted to be fake news post negative opinions. Thus, he proposed building a reliability network based on stances about Twitter news. He created a pair of <topic, viewpoint> using latent Dirichlet allocation (LDA) and then classified them into supporting and objecting tweets related to specific news using k-means clustering. After the classification, in the news-propagation network, news showing a high ratio of objecting tweets is classified as fake news. Jin et al.'s method shows a slight improvement in classification accuracy (5%-9%) compared to Kwon et al.'s method.
Stance detection of social media users is of interest to many researchers, and hence it is brought up in competitions such as SemEval and FNC-1 [42,43]. SemEval is an international NLP workshop for developing semantic analysis techniques. In SemEval, various challenges, including the detection of user sentiment and the stance, are performed. Various teams participate in this annual event that began in 2007 [44][45][46].
Organized in 2017, the Fake News Challenge stage 1 (FNC-1) was adopted as the first stage to detect fake news. In this competition, the classification accuracy of various stances for news such as "Agree," "Disagree," "Discuss," and "Unrelated" is evaluated. Tab. 1 presents the confusion matrix of the FNC-1 winner [47]. In Tab. 1, the classification accuracy is not satisfactory compared to other classes because the ratio of the class is extremely biased to "Unrelated." Thus, we assume that it is difficult to identify a small number of agreement and objection classes. Although there are studies for improving their performance, the detection of user stances remains a very difficult challenge [48,49].

Methodology
This section introduces a fake news detection technique using the temporal propagation pattern of quote retweets. Fig. 1 is an overview of the fake news detection methodology using the temporal propagation pattern of the quote retweet we propose. It is composed of time series dataset processing using the propagation pattern of quote retweet and fake news classifier based on deep learning. Fig. 1a summarizes the process of constructing a training dataset based on the propagation tree of a quote retweet for fake news detection. The data collected from Twitter are processed after feature analysis and processed into a training dataset for fake news classification. Then, the fake news classifier applies deep learning learns the processed training dataset, as shown in Fig. 1b.

Dataset
The training dataset for the fake news classifier was processed as shown in Fig. 1a. The headlines, content, and the reliability of the news were collected from Kaggle. Selenium and Tweepy API were used as data collection tools and the Twitter contents were stored in a database using Jang et al. [36] method. Twitter contents stored in the database undergo pre-processing including natural language processing to extract additional features. In addition, the schema of the table that is merged into one table through join operation is shown in Tab. 2. Tab. 2 shows the features of the collected content, grouped into four types. The feature group consists of basic, linguistic, user-based, and propagation pattern-based quote retweets. The propagation feature group of quote retweets is composed of the usage ratio and propagation depth of quote retweets denoted as HtoL (high to low), HtoH, LtoH and LtoL. Fig. 2 shows the spreading tree of tweets and quote retweets that spread the news over time. The root of the tree (blue, depth 0) refers to news articles and the child nodes of the root (depth 1) are tweets that spread these news articles. Child nodes of a level 1 node are quote retweets, and all nodes above level 1 are quote retweets. In addition, the red nodes are tweets written by influencers. As shown in Fig. 2, there are several differences between the spreading patterns of fake news and real news. As time passes, it can be seen that the depth of the propagation tree of real news becomes deeper than that of fake news. This means that the frequency of quote retweets is higher in the propagation tree of real news. In addition, the relatively low frequency of using quote retweets in fake news propagation patterns means that users who encounter fake news are very cautious about using quote retweets. This is because the quote retweet projects an official stance of the user. Therefore, it is suspected that the main spreading method of fake news is through tweets or normal retweets that directly mention news headlines or links. In addition, there is a difference in the ratio of influencers between the two propagation patterns. Generally, the spread of information from influencers is much stronger than that of general users. As time passes, relatively more influencers are included in the distribution tree of real news. The reason is that if influencers spread fake news, their reputation could be adversely affected. Therefore, it is suspected that the more popular the users, the more cautious they are about spreading information.    Fig. 3 depicts a box plot comparing the average counts of retweets aggregated for each news item, the ratio of quote retweets, the counts of followers, and the average age (days) of the author's account. According to the propagation trees shown in Fig. 3a, there is no noticeable difference in the frequency of retweets between real and fake news. Because the spread of news by retweets has a simpler procedure than a quote retweet, a retweet can be easily exploited to propagate fake news. In contrast, Fig. 3b shows a significant difference between the two boxes depicting the frequency of use of quote retweets. In the case of real news, it can be seen that the frequency of quote retweets is high, and it can be assumed that the official reaction of users to real news is more active. Fig. 3c shows that there is a very large difference between the range of the average number of followers of fake news and real news. In the real news representation, we can see that a diverse user base, including influencers, is included, which established that the more popular the users, the more cautious they are about suspicious information. Fig. 3d shows that the average age (days) of user accounts is lower in the case of fake news. It is suspected that this was an account that someone suddenly created to spread fake news.

Training Dataset Processing and Fake news Classification Based on Deep Learning
We processed the time series dataset for training of the classifier using the registration date information of the contents from the collected data. The procedure is shown in Fig. 4. Each step-by-step description is as follows.

Step 1. Aggregation by News and Day
Contents of tweets within n days after the news is generated may be filtered using the difference between the news registration date and the tweet registration date from the contents of the previous tweets aggregated by news. The features of tweets (all elements of the four feature groups) are aggregated for each news from day 1 to day n using filtering. For example, if the features from Day 1 to Day 3 of each news are respectively aggregated, the aggregate results up to day 1 of each news, the aggregate results up to day 2, and the aggregate results up to day 3 are respectively generated.

Step 2. Integration and Sort
Each aggregate result from day 1 to day n created in Step 1 are integrated into one table, and its schema is expressed in Tab. 3. Then, the integrated table is sorted by news and day in ascending order. In the integrated table defined by the schema of Tab. 3, one record is the aggregation of the features of tweets registered within a few days after a specific news occurs, and we can analyze the aggregation of features over time of each news by referring to the integrated table. As a result, the integrated table expresses the temporal propagation features of each news in a time series.

Step 3. Linearization by News
In order to be used as a dataset for training of the fake news classifier, the time series of each news in the integrated time series table in Step 2 must be processed into one row. Therefore, as shown in Tab. 4, the schema of the dataset for training was defined. The dataset for learning consists of a news title, a time series-based feature, and a fake or not. Among these, the timeseries-based feature columns are features for learning and are two-dimensional arrays that means the propagation pattern of news over time. To implement this multi-dimensional dataset, Python's NumPy module was used. As a result, each news in the dataset for training has a two-dimensional time series feature and a label for the reliability of the news.   Is fake? (0 or 1) Integer The time series-based training dataset is trained on a two-phases deep learning-based fake news classifier represented as in Fig. 1b. Two-phases deep learning-based fake news classifier is composed of a CNN that maintains spatial and regional information of multidimensional data and is strong in abstraction of features, and an LSTM suitable for processing time-series sequences. When the dataset is input, the features of the fake news are extracted from the temporal propagation pattern of the news composed of two dimensions in the CNN phase through convolution operation, and the size of the features extracted through the max pooling operation is reduced and simplified. Then, the extracted features are processed into a one-dimensional vector and then transferred to the LSTM phase. The extracted features are processed into a onedimensional vector and then transferred to the LSTM phase. In the LSTM phase, the reliability of news is finally determined by extracting sequence information using the extracted one-dimensional compressed time series information.

Evaluation of the Proposed Technique
Using the time series-based training dataset described in Section 3 and a fake news classifier using two-phase deep learning, we evaluate the performance of the proposed technique. For the experiment, 16,453 tweets and quote retweets related to 1,149 fake news items, and 56,651 tweets and quote retweets related to 2,278 real news items were collected. Two phases of deep learning for the fake news classifier were implemented using Keras 2.0. The performance of the fake news classifier was evaluated through five-fold cross-validation. The evaluation criteria used accuracy, recall, precision, F1 score, and macro F1 score. Fig. 5 shows the recall, precision, and F1 scores for fake news and real news over time. The x-axis represents days and the y-axis represents the value. Fig. 5a shows the performance of the fake news classification over time. The recall of the classifier that learned the time series pattern up to day 3 was approximately 0.896 and then decreased slightly to approximately 0.889 until day 28. In contrast, the precision reached approximately 0.656 on day 3 and rose to about 0.69 by day 28. The F1 score, a balanced score, reached 0.756 on day 3 and rose to 0.772 until day 28. Overall, the recall performance decreased slightly over time, but the precision increased. However, as a result, the balance of the F1 score improved. Fig. 5b shows the classification performance for real news. It demonstrates better performance compared to fake news. The precision shows a decreasing trend, but little change is observed. In contrast, recall started at 0.819 on day 3 and rose to 0.849 on day 28. The F1 score also rose slightly and rose to 0.89 on day 28. Fig. 5c shows the Macro F1 score and accuracy. The macro F1 score indicates the average F1 score for fake news real news side. It can be seen that the Macro F1 score is lower than the accuracy by about 0.02. Accuracy rose from 0.840 (day 3) to 0.856 (day 28) and the Macro F1 score also rose from 0.818 to 0.833. As shown in Fig. 5, the classification performance of real news was better than that of fake news. This is identified as a problem (about twice the difference) due to an imbalance in the number of samples between fake news and real news. Moreover, the overall performance tends to improve as time passes, but even if only the time series data of day 3 are learned, the performance is almost similar to that of the classifier on day 28.

Performance Evaluation by Dataset
In this experiment, we compare the performance between non-time series data and time series data. Tab. 5 defines the dataset to be used in the experiment. Each news feature in a non-time series dataset is one-dimensional (Last row of 2D features of each news). For example, in the data set on Day 3, information from Day 1 to Day 3 of each news is aggregated and expressed in one row. Since the non-time series data set has one-dimensional features, the convolution operation in the CNN stage has been modified to be one-dimensional.   6 shows the performance of the classifier learning each dataset in terms of F1 score, macro F1 score, and accuracy. Fig. 6a shows the F1 score for fake news classification. The classifier that learned the baseline achieved 0.6667 on day 3 and rose to 0.7169 on day 28. The classifier that learned our method-1 achieved 0.6930 on day 3 and rose to 0.7430 on day 28. Overall, the classifier learning our-method-1 showed about 2% higher performance than the classifier learning the baseline. Therefore, it was proved that the classifier that learned the features of a quote retweet has a better performance in classifying fake news. The classifier learning our method-2, which is a time series, shows about 6% to 9% better performance than the classifier learning two non-time series. Fig. 6b shows the F1 score for thereal news classification. It shows a pattern that is almost similar to the F1 score for fake news classification. In addition, the same pattern is shown in the Macro F1 score and accuracy in Figs. 6c and 6d. In addition, the same pattern is shown in the Macro F1 score and accuracy in Figs. 6c and 6d. As a result of the analysis in Fig. 6, it can be seen that the time series data set reflecting the temporal propagation pattern has a great influence on the classification of fake news.

Performance Evaluation by Feature Group
In this experiment, after learning for each feature group, we evaluated the performance to determine which feature group had a significant influence on the classification performance. Fig. 7 demonstrates the performance of each feature group using the F1 score, the macro F1 score, and the accuracy. Fig. 7a shows the F1 score for fake news classification based on feature group. On day 3, except for the total group, the Prop group showed the highest performance at 0.7345. Subsequently, the performance decreases in the order of user, linguistic, and basic groups. The performance showed an increasing trend until day 28, and the Prop group demonstrated the highest performance at 0.769. The p group shows a slightly lower performance compared to the total, which learns all groups but is still superior to the other groups. The performance of the user group and the linguistic group is generally similar, but it is observed that the performance of the linguistic group is slightly better. Fig. 7b shows the F1 score for the real news classification by feature group. Except for the linguistic group, all groups show an upward trend. The order of overall performance is similar to Fig. 7a and the Prop group demonstrates the best performance except for total. Fig. 7c shows macro F1 scores by feature groups. Overall, it shows a similar pattern to the previous figures. (Figs. 7a and 7b), and the accuracy of Fig. 7d also shows a similar pattern. By reviewing Fig. 7, we can conclude that the Prop group demonstrates the best performance that is close to the total of all features. This proves that the propagation pattern of a quote retweet is an effective feature in classifying fake news and has a great influence.

Conclusion
In this study, we proposed a fake news detection method using the temporal propagation pattern of quote retweets to protect people from false information in the fast-developing hyperconnected smart cities of today. To detect fake news, we collected content that spreads information from Twitter using the Selenium and Tweepy API. Furthermore, we defined groups as temporal propagation-based on quote retweets, user-based, linguistic, and basic, and processed the collected data to extract the features of each group. Then, from the extracted data, we analyzed the features of Twitter content that spread fake news and expressed it visually. The results showed that social media influencers who encountered suspicious news showed a cautious attitude toward the use of quote retweets. In addition, the spread of quote retweets in fake news was weak.
We processed the time-series training dataset based on the propagation pattern to verify that the temporal propagation pattern of the quote retweets is an effective means of detecting fake news. Then, we trained the time-series training dataset on a fake news classifier using two phases of deep learning based on CNN and LSTM and verified its performance. The experiment showed that the proposed fake news detection methodology achieved superior performance compared to the existing techniques and showed effective performance in the early detection of fake news. Finally, through a performance experiment based on feature groups, it was verified that the temporal propagation pattern of the quote retweet is a very influential feature in detecting fake news.
In conclusion, we identified a new propagation pattern in which information was spread on Twitter. However, the proposed technique did not consider the user's stance, which was found to be valid for the classification of fake news. Identifying a tweet user's stance is an important but difficult issue. It is expected that the performance of the fake news classifier that considers the user's stance will further improve the proposed technique.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.