A Survey on Various Methods to Detect Rumors on Social Media

Internet-based life stages have been utilized for data and newsgathering, and they are entirely significant in numerous applications. In any case, they likewise lead to the spreading of gossipy tidbits, Rumors, and phony news. Numerous endeavors have been taken to recognize and expose rumors via social networking media through dissecting their substance and social setting utilizing ML (Machine Learning) strategies. This paper gives an outline of the ongoing investigations in the rumor detection. The errand for rumor detection means to distinguish and characterize gossip either as obvious (genuine), bogus (nonfactual), or uncertain. This can hugely profit society by forestalling the spreading of such mistaken and off base data proactively. This paper is an introduction to rumor recognition via social networking media which presents the essential wording and kinds of bits of rumor and the nonexclusive procedure of rumor detection. A cutting edge portraying the utilization of directed ML algorithms for rumor detection via Social networking media is introduced.


Introduction
With the inception of Web 2.0 and the increasing ease of access methods and devices, more and more people are getting online, making Web indispensable for everyone.The focal point of innovation of Web 2.0 is social media.Active participation is a key element that builds social media.Numerous social networking platforms as Twitter, YouTube, and Facebook have become popular among the masses.It allows people to build connection networks with other people and share various kinds of information in a simple and timely manner.Today, anyone, anywhere with the internet connection can post information on the Web.But like every coin has its two sides, this technological innovation of social media also has some good as well as bad aspects.
We are really benefited by social media but we cannot oversee its negative effects in society.The majority of citizens esteem it as an innovatory discovery and a few seem to receive it as an unenthusiastic bang on the civilization.As a positive case, these online communities facilitate communication with people around the globe regardless of your physical location.The perks include building connections in society, eliminating communication barriers, and helping as effective tools for promotion, whereas on the flip side, privacy is no more private when sharing on social media.
Due to the ubiquitous and overdependence of users on social media for information, the recent trend is to look and gather information from online social media rather than traditional sources.But there are no means to verify the authenticity of the information available and spreading on these social media platforms thus making them rumor breeding sources.The standard definition of the rumor is: any section of data publicize in a community lacking adequate facts and/or confirmation to sustain it thus putting a query on its legitimacy.It might be accurate, bogus, or indeterminate and is generated purposely (awareness seeking, self-objectives, finger-pointing someone, hoax, to extend terror, and disgust) or by accident (mistake).Further, these might be private as well as commercial.Knapp [1] classified rumors into three categories, namely pipe dream, bogy, and wedge driving for describing intentional rumors.
Rumors are circulated and believed overtly.And due to the increasing reliance of people on social media, it is inevitable to detect and stop rumors from spreading to reduce their impact.It gets only a few minutes for a single tweet or post to go viral and affect millions.Thus, rumor detection and mitigation have evolved as a recent research practice where the rumor has to be recognized and its source has to be identified to limit its diffusion.It is essential not just to detect and deter, but to track down the rumor to its source of origin.Various primary studies with promising results and secondary studies [2,3] have been reported in this direction.The work presented in this paper is a primer on rumor detection on social media to explicate the what, why, and how about the rumor detection on online social media.The intent is to aid novice researchers with a preliminary introduction to the area and at the same time, offer background work to the experts.The types of rumors and the typical process of rumor detection are discussed followed by a state-of-the-art review of supervised ML-based rumor detection on online social media.The research gaps have been identified as issues and challenges within the domain which make it an active and dynamic area of research.
Rest of the paper is organized as following: section 2 explains how rumors can detect on social media with its types, section 3 describes about various methods of rumor detection like: machine learning and deep learning methods, section 4 describes various challenges and issues during the rumor detection, section 5 explains open future research directions on which current research is going on or can be done in near future, finally we conclude our work in section 6 following with references used in this work.

Rumor Detection on Social Media
Social media has the power to make any information, be it true or false, go viral, and reach and affect millions.Due to the speed of information spread, even rumors are spread.Hence, it is necessary to detect and restraint these rumors before they have a serious impact on people's lives.

Types of Rumors
A rumor is defined as information whose veracity is doubtful.Some rumors may turn out to be true, some false, and others may remain unverified.Not all false information can be classified as a rumor.Some are honest mistakes by people and are referred to as misinformation.On the other hand, there may be intentional rumors put to mislead people into believing them.These are labeled as disinformation and are further classified based on the intent of the originator.The following Fig. 1 depicts the classification of rumors.We define a rumor as any information put out in public without sufficient knowledge and/or evidence to support it.It is misleading, either intentionally or unintentionally.
If some information has been put out in public erroneously without authentic or complete information with no ulterior motive of hurting or causing any disturbance to anyone whatsoever, it is called misinformation.It is an honest mistake.Disinformation, on the other hand, is information that is intentionally put out in public view to mislead people and start a false rumor.Disinformation depending on the motive of the writer and nature of the post can be classified as humorous, hoax, finger pointing, tabloids, and yellow press.The most harmless type of rumor is the humorous ones.
Sources spreading this type of information fabricate news and stories to give it an amusing side.The motive is usually to entertain people.The information is pre declared to be false and intended only for comical purposes.
The best examples of such sources include news satires and news game shows.The next form of disinformation is a hoax.A hoax is intentional fake news spread to cause panic among people and cause trouble to people at whom it is aimed.A hoax can also be an imposter.Examples include fabricated stories, false threats, etc.In 2013, a hoax stating Hollywood actor "Tom Cruise to be dead" started doing the rounds.Social messaging apps like WhatsApp worsen the situations when it comes to hoaxes.Currency ban of Indian rupees 500 and 1000 was done in November 2016.Soon after a hoax message went viral onWhatsApp stating that the government will release a new 2000 rupee denomination that would contain a GPS trackable nano chip that would enable to locate the notes even 390 feet buried underground.The government and bank spokespersons had to finally issue an official statement stating it was false.Still, many people found the official statement hard to believe as they were so brainwashed by the hoax message.

Fig. 1 Classification of rumors
Another form of disinformation is finger pointing.Finger pointing always has an associated malicious intent and personal vested interest.It blames a person or an organization for some bad event that is happening or happened in the past.It aims at political or financial gain by tarnishing the image of the target person/ organization/party/group, etc. Tabloids have a bad name for spreading rumors from since when they started.It is the type of journalism that accentuates sensational stories and gossips about celebrities that would amount to spicy page 3 stories.Yellow press journalism is a degraded form of journalism which reports news with little or no research at all.Journalists' only aim is to catch attention using catchy headlines with no regards whatsoever to the authenticity of news.They do not bother to delve deep into a story but just publish it to sell as many stories as possible and make money.It is the most unprofessional and unethical form of journalism.

RUMOR DETECTION APPROACHES
There have been various efforts in the field of rumor detection and mitigation.Many authors have used simple cue-based, network based, Psycho and social theory based approaches whereas many other have used machine learning approaches.Many other studies have incorporated different aspects and their methodology is an amalgamation of various techniques.There has also been a debate around which features are most important in detecting a rumor.This has led to a new approach of deep learning where feature selection is not required for the efficient performance of the framework.Here, we discuss various supervised, unsupervised and other machine learning approaches, as well as the deep learning based approaches in the field of rumor detection.

Machine Learning Based Approaches
There have been various efforts in information credibility analysis in online social networks.As the dataset is an important characteristic of any problem to be solved in a Machine learning scenario, the early works were more focused on the feature engineering.In one of the early works using machine learning, Castillo et al. [4] use algorithms including Decision trees like J48 and Random Forest, Support Vector Machines (SVM) and Bayes networks for evaluation of credibility of a tweet.The input features to these algorithms were based on the characteristics of users, messages, propagation dynamics and topic in question.On the basis of the results obtained, they concluded that the topics pertaining to news (chats and opinions excluded) which are credible, are mainly single-sourced or a few-sourced; propagated through authors who have a long history of propagating like messages [4].
Another work on rumor analysis and detection by Yang et.al proposed two more features: 'client program used' and 'event location' [5].They performed two experiments on data of Sina Weibo to study any improvement in the efficiency through the introduced features.One experiment was performed on the existing set of four features and other was performed with augment of the two features.The study concluded that the augmentation of the proposed features improved the overall accuracy on SVM from 72.5 % to 77 %.As the study done by Yang et.al [5] was carried out on data of Weibo, its validity on twitter data was still a question due to different nature of both the platforms.The need to create a standard benchmark dataset for rumor detection was felt and many researchers devoted their studies for the same.
In this pursuit, Qazvinian et al. [6] released an annotated dataset of Twitter microblog for rumor detection.This dataset contains tweets pertaining to five established rumors being investigated.This dataset, among many researchers, was also used by Hamidian and Diab [7] for rumor detection, by employing a multi-staged strategy (3-class classification followed by a 4-class classification) with varying set of features and different pre-processing tasks.They added two twitter and network features: Replay time (network based) and time of posting the tweet (Regular day or busy day).They also added three pragmatic features: Named Entity recognition, Emoticon and Sentiment.They used J48 decision tree algorithm on Weka to carry out the experiments.The method that they used was different in the sense that in the common 6-class classification, a single step was involved in the detection and classification of the rumors while as in the 3:4 class classification, detection was followed by classification.They reported that their 2-staged strategy (each for detection and classification) outperformed single-staged strategy with 14% increase in F1 Score on Obama dataset.
There have been certain contradicting conclusions with studies like ones carried out by Sahana et al. [8] stating that user-based features have very less significance or no correlation with the rumor detection while as some studies like one conducted by Castillo et al. [4] show that user based features enhanced the performance of the rumor detection system.Castillo et al. [4] also identified word frequencies as an important feature for rumor detection whereas Sahana et al. [8] stressed that content-based features are important for rumor detection.They reported an accuracy of 87.9% in their approach, using J48 algorithm with 10-fold cross validation for 10 iterations.The dataset they used, was based on tweets and retweets about London riots.They also conclude that most active users are prone to rumor propagation as they retweet without establishing the credibility of a tweet.Another study on rumor detection was carried out by Kwon et al. [9].They examine different rumor characteristics over varying time window.They employed the variable selection process using Random forest algorithm proposed by Genuer et al. [10] for selecting temporal, linguistic, user, and network based features.The temporal window was kept as 3, 7, 14, 28 and 56 days from the onset of rumors.The authors proposed two algorithmic approaches, one with user and linguistic features and other with all of the features.It was observed that the user and linguistic features perform better to detect rumor at the onset whereas the structural and temporal features were beneficial in telling rumors from non-rumors.
Takahashi and Igata [11] explore essence of among many features, a feature "retweet ratio" in rumor detection.They conclude that although it remains inconclusive in the investigated sample, it may be beneficial for large sample size.Another study presented by Jain et al. [12] to detect misinformation on twitter uses mismatch ratio as threshold for detecting whether a topic constitutes for a rumor or not.The basic assumption in their study was that the verified news channels on twitter would be very less prone to spreading rumors than any other user.Based on this assumption, they create two sets of tweets relating to a topic and calculate sentiment and contextual mismatch between them.If the value of calculated function of mismatch (which is a ratio) is more than a threshold, they label the topic as 'rumor' and if the value is less, it is labeled as a 'non-rumor'.The authors concluded that the results were better if the tweets were less subjective and more objective in nature.Chang et al [13] used a cluster based approach for political rumor detection on the dataset consisting of two sets of tweets.One set consisted of tweets about Barack Obama in September, 2015 and the other contained tweets related to Hillary Clinton, posted in August, 2015.They identified 'extreme users', the ones having tendency to tweet false news and rumors.These users were identified on some features such as 'high tweeting frequency', 'huge number of followers', 'use of extreme keywords in tweets' and over-enthusiasm' about the topic [13].They use cosine similarity to club the clusters discussing the same news, after clustering the tweets containing same URL as a link.They reported that the best rule derivations are subjective and thus differ from one case to another, as a function of dataset.

Deep Learning Based Approaches
Deep learning has proven to be very advantageous over traditional machine learning in various problems owing to the fact that it is almost immune to the feature selection problem.Deep neural networks need no less features to work efficiently and rather can perform well on unsifted features.
Ruchanski et al. [14] propose a three module hybrid model for fake news detection.Their model is based on three steps or modules.The devised model focuses on textual, user response based and Source based features.The module named as 'capture' leverages Long Short Term Memory networks (LSTM).This module captures the temporal text and temporal activity of a user pertaining to a given article.The second module, Score, focuses on the source characteristics pertaining to the behavior of a user.It assigns score to a user based on his tendency of participation in a particular source promotion group.The third module combines the result of first two modules into a vector for classifying an article as fake or not-fake and is thus named as 'Integrate'.Ma et al. [15], in one of the earliest works of rumor detection with the aid of neural networks, apply recurrent neural networks to detect rumors.Based on their observation that a rumor is initiated from an original post (source) and a series of re-posts, relating posts and comments follow the original post, they utilize time series concept to model rumor data.They treat a batch of posts falling in the same time interval as a single unit in time series and model the data using recurrent neural network (RNN) sequence.For each interval, top-K values pertaining to term frequency-inverse document frequency (tf-idf) of vocabulary were taken as an input.Their model performs well than the contemporary manually selected feature methods.
Chen et al. [16] also use recurrent neural networks for early detection of rumors.They use what they call as attention mechanism, in their models to understand the particular words that are important for a particular rumor category.They create batches of posts according to the time intervals and use tf-idf as the input representation.They conclude that the attention mechanism is efficient in detecting rumors and it results in ignoring unrelated words, while giving less weight to the event related words but more weight to the words expressive of a user's doubts and anger relating to the rumor.Yu et al. [17] propose a convolutional neural network (CNN) for misinformation detection.Based on the observation that RNN is incapable of detecting rumors at early stage due to its bias towards the temporal sequence of input, they split a rumor into different phases.Then, they use doc2vec for vector representation generation subsequently used as an input to a two layered CNN.Nguyen et al. [18] propose a model based on CNN and RNN for early detection of rumor.Apart from the time-series based classification model, they use event credits for prediction of rumors.In the proposed model, CNN is used to learn the hidden representations of specific tweets by extracting a sequence of high level phrase representations as input to LSTM, providing the tweet representation as an output.The output of this model (CNN+RNN) is then combined with a dynamic time series based rumor discrimination model to get the final output.The authors report improved efficiency in classifying rumors in early hours of spread of a rumor.

Issues and Challenges
Rumor detection comes with its share of issues and challenges.The main challenge for carrying out the rumor detection task is the collection of data.Even the most popular social media sites, namely Twitter and Facebook do not provide full freedom to users for extracting data.Most of the data posted on Facebook is private in nature, hence inaccessible.Only data posted on Facebook pages can be collected.Twitter, on the other hand, these days does not allow data older than seven days to be fetched.Another issue faced by researchers is the detection of new rumors from real-time data.It is easier to detect old posts regarding a rumor that we know of because we know the keywords.But with emerging rumors we are in a fix as we do not know what to look out for.Also, some rumors remain unspecified and there is no conformation or debunking for them.Hence, detecting rumor veracity is very challenging.Another aspect that needs to be taken care of is the detection of origin of a rumor as it is difficult to identify the user who started a particular rumor.These issues need to be addressed to improve the quality and speed of rumor detection.

Future Research Scope
Even though noteworthy advances have been made in exposing bits of Rumor through Social networking media, incidentally, there stay numerous difficulties to survive.In view of the survey of past examinations and furthermore our encounters in both research and down to earth framework execution of Rumor Detection, here we present a few bearings for future rumor discovery inquire about.
 Knowledge Base: Knowledge Base (KB) is useful for bogus news discovery [19].There have been a few examinations on utilizing KB for bogus news discovery, however not many or none on rumor detection over online life.One explanation is that for bits of rumor via social networking media, we as of now have a lot of data, particularly the social media data, to misuse and do inquire about on.Another explanation is that, contrasted with bogus news recognition which chiefly manages news stories, bits of rumors through social networking media are about different subjects, and it is difficult to manufacture proper KBs that spread them.Along these lines, most past examinations on rumor recognition have not focused on abusing KB for exposing bits of rumors. Target of User Response: Client reactions are very instructive for rumor identification.Normally, bogus bits of rumor will get progressively negative and addressing reactions, which can be utilized for rumor detection.Each source message (gossip guarantee) has numerous answers, and they are either immediate answers or answers to different messages in the change string.The structure of the transformation string is significant for understanding the genuine position of the client of an answer.For instance, given the message "This is phony" and an answer to it "I absolutely concur", in the event that we don't consider that the answer is towards "This is phony", at that point we will give an off-base position name, "support", to this answer.However, this reaction is denying the rumor guarantee.In spite of the fact that the neural system models dependent on engendering investigation may somewhat become familiar with this data, we think expressly handle this circumstance would improve rumor discovery execution. Cross-domain and Cross-language: Most past examinations stress on recognizing bogus rumor tidbits from reality with trial settings that are commonly constrained to a particular internet based life stage, or certain point areas, for example, legislative issues.Breaking down bits of rumor across points or stages would let us increase a more profound comprehension of bits of rumor and find the extraordinary qualities that can additionally help to expose them across areas (subject and stage). Explanatory Detection: Most rumor identification moves toward just foresee the veracity of rumor, and next to no data are uncovered why it is a bogus rumor.Finding the confirmations supporting the forecast and introducing them to clients would be exceptionally gainful since it causes clients to expose bits of rumor without anyone else.Making the outcome informative has pulled in inquire about in different regions, for example, illustrative proposal, yet it is as yet another point in the rumor identification field.This may get more earnestly as more models are utilizing profound learning procedures these days.Be that as it may, as AI methods are utilized in more applications, the requests for result clarification from clients are additionally expanding. Multi-task Learning: Studies as of now show that together learning of stance identification and rumor discovery improves the exhibition of rumor identification [20,21].In the rumor identification work process, contingent upon the calculations, the accompanying errands may be included: client believability assessment, source validity assessment, information extraction, and so on.On the off chance that there are fitting datasets with explanations for these information types, one research heading is to investigate perform multiple tasks learning for these errands, notwithstanding the stance identification and rumor identification undertakings.We expect it will profit the rumor identification forecast task. Rumor Early Detection: rumor early identification is to distinguish rumor at its beginning time before it wide-spreads via social networking media with the goal that one can take fitting activities prior.Early detection is particularly significant for a constant framework since the more rumor spreads, the more harms it causes, and more probable for individuals to confide in it.This is an extremely testing task since at its beginning period rumor has little proliferation data and not very many client reactions.The calculation needs to basically depend on substance and outer information, for example, KB.A few examinations have tried their calculations on the beginning period of bits of gossip [9,22] investigated highlight strength after some time and detailed that client and etymological highlights are better than organized and proliferation highlights for deciding the veracity of rumor at its beginning time.Despite the fact that there are as of now a few investigations toward this path, more research endeavors are as yet required, because of its significance in the genuine frameworks.

Conclusion
This paper presented the primary concepts of rumor detection.As much as social media has become an invaluable source for sharing real-time and crucial information, it is also a breeding platform for rumors.Timely rumor detection is essential to prevent panic and maintain peace in society.This paper explains the rumor detection process and reviews the research carried out for rumor detection using various ML techniques.The scope of this review is limited to a single level classification task where we predict whether given online information is a rumor or not.This task can be extended to a multi-level, fine-grain classification where rumors can be detected for being a misinformation or a disinformation, hoaxes, etc. Various novel and hybrid machine learning techniques such as fuzzy, Neuro fuzzy can also be used for detecting rumors.