Election Prediction on Twitter: A Systematic Mapping Study

,


Introduction
e relation between social media platforms, being the new way of linking the parts of the world, and politics is no secret.
is relation attracted researchers seeking to exploit this era's new abundant useful information to perform different tasks such as information extraction and sentiment analysis, among others. One of the most widely used platforms by researchers is Twitter. Apart from the dictionary approach and statistical approaches, machine learning has been effectively applied in several other domains for different purposes, for instance, [1][2][3]. Machine learning improved the prediction job in terms of accuracy and precision.
As of October 2020, Twitter had over 300 million users worldwide; 91% of them are over the age of 18. is platform attracts many politicians and enables them to interact and use it as a tool in their campaigns [4]. Offering an API that allows extracting public tweets and user's public information and interconnections, it is considered a treasure for researchers aiming for election predictions.
Many researchers have analyzed and predicted different countries' elections on different social media platforms such as Facebook and Twitter [4][5][6][7][8]. Few studies surveyed this topic [9][10][11]. To the best of our knowledge, no study ever has reported a systematic mapping study (SMS) or systematic literature review (SLR) about election predictions on Twitter. is research systematically identifies, gathers, and provides the available empirical evidence in this area.
is research study assists in providing a comprehensive overview and getting more in-depth knowledge about election prediction on Twitter, thus helping to (i) identify research gaps (research opportunities) (ii) aid researchers (decision-making) when selecting approaches or tools.
e main contribution of this research work is as follows: (1) Identify and classify the main approaches (RQ1) used to predict election: its techniques (RQ1a) and the tools (RQ1(b) (c)) (2) Identify the research works that have reported manual/automatic data labelling (political data) (RQ2) (3) Identify and enlist the countries whose elections are analyzed (RQ3) (4) Identify and list the tweet languages used for predicting election on Twitter (RQ4) (5) Identify main topics used in the studies using machine learning techniques (RQ5) (6) Identify some demographic data in the field of election prediction on Twitter, such as the most frequent publication venues, active countries, organizations, and researchers (DQs) (7) Providing a centralized source for the researchers and practitioners by gathering disperse shreds of evidence (studies) e remainder of this paper's organization is as follows: Section 2 provides an overview of the most related work, and Section 3 presents a detailed methodology, following by Results and Discussion in Section 4. Furthermore, Section 5 deals with Validity and reats, followed by the Conclusion and Future Work discussed in Section 6.

Related Work
is section presents the most related work to SMS on election predictions on Twitter.
Chauhan et al. [9] in 2020 surveyed election prediction on online platforms such as Twitter and Facebook. eir study presents an in-depth analysis of the evaluation of SA techniques used in election prediction. ey overviewed nearly 48 studies, including 10 studies that tried to infer users' political stance.
In May 2019, Bilal et al. [10] presented a short overview of election prediction on Facebook and Twitter. ey gave an overview of 13 studies. eir study mainly categorized the studies into two approaches: sentiment analysis and others. Additionally, they categorized those studies into two categories: "can predict elections" and "cannot predict elections. " Singh and Sawhney [11] conducted a review of 16 papers in December 2017 related to forecasting elections on Twitter.
ey listed the countries whose elections were analyzed and provided tweet statistics used in the selected studies. Furthermore, they listed and presented the methods used for prediction and classified the studies into successfully and unsuccessfully, predicted elections.
All these studies presented short reviews except for [9]. Besides, all the aforementioned studies performed Adhoc literature surveys, and none of them followed a detailed systematic protocol.
is study is the first systematic mapping study that mainly focused on election prediction on Twitter and thoroughly overviewed and analyzed the selected 98 primary studies.

Methodology
A systematic mapping study (SMS) is an effective way of getting knowledge about the state-of-the-art of a research field. is study conducts an SMS of election prediction on Twitter. Figure 1 shows the detailed flow of this SMS.

Approaches for Predicting Election on Twitter.
Various approaches possibly could be employed to predict elections on Twitter. Researchers and practitioners mainly use three approaches: sentiment analysis (SA); volumebased (Vol.); and social network analysis (SNA). Figure 2 shows a generalized framework of election prediction on Twitter. A Twitter API is used to collect tweets about the election (candidates, election, political party, and trends). It is then preprocessed (cleaned and filtered) according to the needs, such as removing unnecessary characters, whitespaces, stemming, and so on, for sentiment analysis. Afterwards, an approach or technique is employed to perform the election prediction job or task effectively.

Aim and Research Questions.
is study aims to identify and categorize the methods used for predicting elections on the Twitter platform. is aim can be divided into a set of research questions (RQs) for its broadness. We also gathered and investigated some exciting information by defining and answering some demographic questions (DQs): most active countries, organizations, and authors.
is information helps the practitioners, researchers, and organizations in a certain way [12][13][14][15]. e set of DQs is as follows: DQ1: who are the most active researchers in the field of analyzing election prediction on Twitter?  former operation after analyzing the research field to which this study applies, "Election Prediction on Twitter." Table 2 shows the whole set of selected keywords for this study. In the latter operation, we selected a list of digital libraries to execute the search strings. Five digital libraries were selected to carry out this research: IEEE Xplore, Web of Science (WoS), Scopus, ACM, and ScienceDirect. e keywords were used to create final queries using We executed search queries on the level of title, abstract, and keywords of the articles. Some digital libraries do not provide search on the level of title, abstract, and keywords. In such a case, the search is performed on the entire text. Table 3 shows the list of digital libraries and the search queries that were executed to obtain potential primary papers. We performed the search in three different periods (phases), which are as follows: I. E 1 : searching and selection of papers from January 1, 2010, to January 14, 2020 II. E 2 : searching and selection of papers from January 15, 2020, to January 7, 2021 III. E 3 : searching and selection of papers from January 1, 2010, to January 7, 2021 e logic behind the three extraction phases is that we started this research before the second phase. Due to Covid-19, the work has been delayed. It can be noticed that E 2 is not performed on Scopus. It is because, in mid-2020, Scopus has discontinued the search in its library. We used the Scien-ceDirect library as an alternate to Scopus.
Almost every digital library allows users to export the search results in some formats, that includes the title of the paper, metadata (venue, year of publication, authors names, authors affiliation, and much more), abstract (some digital libraries do not provide that), and keywords. After executing the first search, we obtained 787 potential papers.

Selection of Study and Quality Assessment.
Mainly two tasks are included in the process of selecting a relevant paper: (1) defining the criteria for including/excluding the paper and (2) applying the defined criteria to choose the relevant papers [16][17][18]. e following inclusion criteria were applied to the abstract of each paper: IC1: the study, related to election prediction (or forecasting) on Twitter IC2: research published in the field of "Computer Science" IC3: research published online between January 2010 and January 2021 IC4: the reading of the study abstract must fit the topic e following criteria were applied to exclude the papers: EC1: research papers, written in languages other than English EC2: papers that are not accessible in full-text EC3: research published in non-peer review venues EC4: grey literature and books EC5: exclude short papers (less than four pages) EC6: exclude duplicate papers (selected only the most recent and detailed one) EC7: studies that present summaries of editorials/ conferences A top-down approach was followed to fulfill the criteria for the quality of the selection of relevant papers. Initially, the papers were excluded after taking the metadata such as title, abstract, and keywords of the papers into consideration. Furthermore, studies were excluded after reading the entire paper, if it is not in the scope of the current topic "Election Prediction on Twitter" or having low quality, such as the paper's methodology did not satisfy the reader (author).
All the papers were equally distributed among all the authors to select the relevant paper by applying the inclusion and exclusion criteria. e authors held a meeting to ensure that a relevant paper is not excluded and an irrelevant paper is not included. e authors applied the criteria defined in [16,17], to deal with disagreements. e details are given in Table 4. A paper is excluded if it falls in the category "F" (Exclude) or category "E" (consider as doubtful). Figure 3 shows a full flow of the search in the five digital libraries and the selection process using inclusion/exclusion criteria. Table 5 shows the list of 98 primary selected papers for this SMS study with their bibliographic references. List the techniques used for collecting tweets.

RQ2
Identify and list the studies that manually/automatically labelled the data to assist their experiments (training, testing data).

RQ3
List of the countries whose elections are analyzed on Twitter in the selected papers.

RQ4
List of tweet languages analyzed in the selected papers.

Data Extraction.
Data extraction is the process of extracting relevant information from the primarily selected papers according to the defined research and demographic questions. Initially, we agreed upon Data Extraction Form (DEF) after going through a thorough review. Next, we started proper extraction from the papers. "Data Extraction Form" provides a reliable and precise approach to extract data in systematic mapping studies [16,19]. We inspected and thoroughly read the full-text of nearly all papers.

Results and Discussion
In this section, we briefly discuss the results of this SMS. A summary of the most notable results in each research and the demographic question is discussed separately. Figure 4(a) shows the number of studies published in different venues (Conference or Journal). Figure 4(b) shows the distribution of studies across the years. It is noteworthy that the topic of "Election Prediction on Twitter" is attracting researchers' attention since the last decade. Figure 5 shows the number of studies that use different approaches for election prediction on Twitter: sentiment analysis (SA), sentiment analysis (orientation), volumetric (Vol.), social network analysis (SNA); topic modelling using LDA (in this study, the algorithm name LDA is used instead of topic modelling in the approaches); and a combination of these approaches such as SA & Vol.; SA, Vol., & SNA; and SA (orientation), SNA, & LDA.

RQ1: What Are the Approaches Used in Predicting Elections on Twitter?
In this SMS, we have taken SA and SA (orientation) separately to facilitate researchers' rapid approach to the specific study. SA approach includes a study that used either or both polarity detection (positive, negative, and neutral) and emotion detection (tense, angry, sad, happy, relaxed, exhausted, calm, excited, and nervous). SA (orientation) studies the political orientation of voters by analyzing their tweets that show voting behaviour explicitly, such as "I will vote for candidate A" and "I will not vote for candidate A. " We defined the following terminologies to be used in the rest of the paper: i: an approach used alone in a paper j: an approach used along with other approaches in a paper Figure 6 presents the approaches, along with the primary selected study(s). Figure 5 shows that 64 studies used the sentiment analysis approach only (SA i ), nearly 65% of all the primary papers used in this study. Only 3 papers used SA (orientation).
It is interesting to note that only 9 papers employed the volume-based approach only, making it almost 9%. A hybrid approach of the "SA and Vol." has been used by 16% of the selected studies. 1 study used SNA only, and 3 papers used the combination of SA and SNA approach, which makes almost 5% of the total studies. Only two studies used LDA along with the other approaches. S-17 used LDA for topic modelling and categorized those topics into positive and negative.
It is worth noting that most of the studies applied an SA approach (SA i + SA j ), which makes 89%, followed by a volume-based approach, concluding, 26% of the studies (Vol i + Vol j ). Very few studies employed a social network analysis approach. Opinion mining depicts better understandings about a political user's behaviour. A user's expressions in words are more understandable than the communication connections; an example is 100 citizens who comment negatively on a political leader's post. It can positively impact the results of a prediction using a volumetric or SNA approach, but it is certainly against the leader (opposing in context). us, many researchers tend to use the SA approach.  Figure 7 shows the number of studies reporting these techniques. Numerous studies have employed supervised (S) learning techniques, 34 studies (S i ) making almost 35% of the selected studies. By looking in-depth, we can see that some studies used other techniques along with it, such as S-41, S-51, and S-92. In conclusion, 51 studies used S-learning in total (S i + S j ), which makes it the highest used technique (52%) in this SMS.
Several studies used the LA for sentiment analysis, especially for tweets other than English. 25 studies employed LA i . Few papers reported LA j , making it (LA i + LA j ) 39% of selected studies in this SMS. 18% of the selected studies used the count (C i + C j ) techniques. Few papers employed US techniques in total (US i + US j ) 9%. Only 5% of the selected studies used deep learning (DL i + DL j ) techniques. Some studies used another tool/ library for sentiment analysis, such as S-77 used TextBlob without mentioning any algorithm. Figure 8 shows the techniques along with the study(s).

RQ(b): Which Tools Are Utilized?
is section gives an overview of the tools, libraries, and dictionaries (TLD) used to assist the election prediction on Twitter. In addition to the list of TLD, the list of primary studies has been given exclusively in Table 6. NLTK is used the most. Some tools provide a graphical user interface (GUI), such as WEKA, RapidMiner, and Gephi. Nearly, 13% used such GUI tools. Almost 18 types of dictionaries are employed in the primary studies. Only one study reported Hadoop. e rest of the details can be seen in Table 6.

RQ(c): Which Techniques/Tools Are Employed for Tweet
Collection? Data can be collected from Twitter either using API or by crawling. Twitter provides two types of APIs: REST and Streaming. Few of the selected studies did not explicitly report any technique for collecting Twitter data, such as S-22, S-28, S31, S-35, and S-95. Some of the studies reported "Twitter API" only. S-57 used a dataset in Data World [66]. Figure 9 shows the number of studies that use different techniques and tools for collecting tweets. In this SMS, we used techniques and tools (name) similar to those reported in the primary studies. An example is Tweepy and twitter4j are Streaming APIs and is taken separately from Twitter Streaming API.

RQ2: Which Studies Reported Manually/Automatically
Annotated Data? Annotated (or labelled) corpus assists in training supervised and semisupervised techniques [67]. Large and unambiguous annotated data can lead to a better prediction by improving an algorithm's results. Data can be annotated either or both manually and automatically [68].
ere are few political annotated datasets available. Languages other than English lack such datasets.
is RQ aims to identify and list the studies that used manual or automatic data labelling. Some studies worked in languages other than English, such as S-48 annotated tweets in the Bulgarian language. Few studies employed automatic data labelling techniques such as S-79 uses deep neural networks to label the data. Figure 10 shows the list of studies that use manual or automatic political data labelling.

RQ3: Which Countries Are Reported for Election Prediction on Twitter?
is RQ aims to identify and list the countries whose elections are analyzed in the primary studies. Figure 11 shows the list of 28 countries and the total number of studies that analyzed its elections. It can be seen that 27 studies analyzed USA elections and 24 studies studied the prediction of Indian elections (both country level and regional). Elections of Indonesia, Netherlands, and Spain are reported in 7 studies, respectively, followed by Pakistan in 5, the UK in 4, and the rest can be observed in Figure 11.

RQ4: What Are the Languages of Tweets Used for Predicting Elections on Twitter?
e objective of this RQ is to classify and list the tweet languages used in the primary Roughly, 45% of the primary studies used English tweets. Subsequently, 7% of studies analyzed tweets in Indonesian and 7% in Spanish languages used. Figure 12 presents the list of languages and the number of studies that investigated them. Some studies translated tweets from other languages to English for further investigation. e reason is that other languages lack resources (annotated data and dictionaries); S-20, S-41, S-61, and S-76 are examples. S-17 used Chinese candidates' names for tweet collection and used the volumetric approach for predicting the election. Almost 16% of studies have not reported any language, volumetric approach (most studies).

RQ5: What Are the Most Frequent Topics Discussed?
e goal of this question in this study is to extract information from the selected studies automatically. Such an approach can help the researchers to have an insight into the topics discussed. We classified the implementation and representation into two parts: (1) topic modelling (correlation) and (2) word cloud. LDA [69] is an example of topic modelling. We applied the topic modelling technique on two levels of the primary studies:

Abstract level 2. Full-text level
We further generated word clouds from the selected papers on the following levels: 1. Titles 2. Author keywords 3. Abstracts

Full-text
We converted all the papers from PDF to Text. For topic modelling, the data are preprocessed to clean the extracted data.
e steps include converting all text to lower case, stemming and lemmatization, and employing stop words (English). Furthermore, sections such as "Acknowledgement" and "References" were excluded to perform topic modelling at "full-text level." For word cloud, all the text at different levels (title, keywords, abstract, and full-text) is tokenized into single words, followed by removing unnecessary words using stop words (English). Next, compute the word frequencies and generate a word cloud for each level. Figure 13 shows the 25 topics generated at the abstract level and illustrates the correlations between them. Blue circles represent correlated topics, while the red colour shows the anticorrelation or inverse correlation. It shows us exciting findings, such as "sentiment analysis polarity" has a high correlation with "presidential predict win." Another topic, "social media popularity," is highly correlated with "presidential predict win," "outcome account expects," and "election poll outcome." e rest of the correlation and inverse correlation of the topics can be explored in Figure 13. Figure 14 represents the correlation between 25 topics generated from the primarily selected papers' full-text. It is interesting to note that nearly all the topics are anticorrelated.

Complexity
Analysis, Political, Predict, and Opinion." Figure 16(b) illustrates almost the same themes from full-text as discussed in other word clouds. Some of the famous words are "Twitter, Election, Social, Prediction, Social, Media, Users, Presidential, Opinion, and India." It shows that most of the studies applied sentiment analysis to predict elections on Twitter. It shows that several studies analyzed presidential and Indian elections. By comparing the findings from word clouds and the outcomes of RQ1, it is noteworthy that both the results are nearly the same. As discussed in Section 4.1, approximately 89% of the studies applied sentiment analysis (SA i + SA j ). RQ1(a) shows that machine learning techniques are employed the most. Furthermore, RQ3 shows that the majority of the studies analyzed USA and Indian elections. e outcomes from the word clouds reflect almost the same information.

DQ1: Which Are the Most Active Researchers in the Field of Analyzing Election Prediction on Twitter?
A total of 284 researchers contributed and appeared as authors in the 98 selected primary studies. We selected researchers who have appeared in two or more papers in the selected studies. Figure 17 shows the most active researchers along with the study they contributed in.
Almost 100% of the active researchers are affiliated with academic organizations. ese data identified some research groups in which researchers collaborated, such as Brian Heredia, Joseph D. Prusa, and Taghi M. Khoshgoftaar. ese data let us know that the researcher Malhar Anjaria is not active since 2014. Furthermore, we noticed that the research group of Rincy Jose and Varghese S Chooralil is not active since 2016. is finding also tells us that more academic and industrial collaboration is needed.

DQ2: Which Are the Most Active Organizations?
is RQ aims to identify and list the most active organizations that appeared in the selected studies. A total number of 158 organization names were listed, out of which 13    organizations contributed to more than one study. e list of the organizations and their support level (contribution) is given in Table 7.
In this SMS, we have divided organizations into two categories: industry and academic (university, research institute, and government research organization). It is attention-grabbing that most of the academia is more active than the industry. Only 7 industrial organizations appeared in the selected studies. In S-82, one researcher named Nathaniel Poor affiliated to no organization. ere is a need for more industrial and academic collaboration that can improve this domain. Figure 18 shows the distribution of organizations.

DQ3: Which Are the Most Active Publication Venues?
is RQ aims to identify and list the most active publication venues in the selected studies. Table 8 shows the venue name along with the support level (>1). e most active conference is "Lecture Notes in Computer Science," whose support level is 5, followed by the "Communication in Computer and Information Science" conference. Only two journals, "PLOS ONE" and "Social Network Analysis and Mining," have support level 2. e research is more published in conference venues, so the trend should be published in more prestigious peer-reviewed journals.

Validity Threats
We have followed some protocols to avoid or mitigate the validity threats (VTs) in this study. ese VTs are as follows:

Reliability
Each of these VTs is discussed separately in the subsequent sections.

Descriptive Validity.
Descriptive validity (DV) deals with the accuracy and objectivity of the extracted information. DV endorses that no imperative information is skipped or ignored during the extraction process. To deal with DV, we arranged regular sessions to discuss and build agreement upon the extraction process, such as what information needs to be collected and stored. We agreed and designed Data Extraction Form (DEF) collectively. To maintain unbiasedness and ensure traceability, every entry in the DEF has a comment that links each extracted value assigned by the researcher.

Interpretive Validity.
Interpretive validity (IV) deals with the validity of the conclusion drawn from the extracted information and ensures that the information extracted by a researcher is unbiased. To minimize IV, we applied the subsequent mechanisms. Initially, we arranged regular meetings to ensure that all the researchers are agreed upon the same interpretation and conclusion of the results (extracted information), a set of protocols, and their executions. Next, excluding the first author, researchers were divided into two distinct groups, drawing the results' interpretation. e first author compared the drawn conclusions, matched them, and standardized the writing style. Finally, all the authors substantiated the interpretation and its traceability to the previous results in the DEF.

eoretical Validity.
eoretical validity (TV) is a vital type of threat as there is a possibility of various inaccuracies while selecting relevant papers, such as biasedness of a researcher while extracting the papers, incapability of the search and selection process (either or both selecting irrelevant papers and excluding relevant papers), and quality of the selected papers, which leads to flawed conclusions.

Complexity
We followed protocols discussed in Sections 3.3 and 3.4 to search the papers in the five databases and select relevant papers to minimize this threat.

Generalizability.
To reduce this threat, we relied upon the impartiality of the data extraction process, DEF, and the set of rules to investigate, leading to the interpretations. Nevertheless, we assume that the primarily selected studies (98 papers) achieve the generalization with low-risk [70].

Reliability.
To increase this SMS's reliability, we performed a comprehensive report of the complete process from the start of the protocol till the conclusion. Finally, we described the rubrics used for self-appraisal by implementing the guidelines from Kitchenham and Charters [70] to minimize the threats.

Conclusion and Future Work
is study reports the planning, conducting, and implementation steps on "predicting elections on Twitter." We selected 98 studies from January 2010 to January 2021. is study aims to identify and classify the approaches, techniques, tools, countries, and languages used in election prediction on Twitter.
We defined and implemented a search strategy to achieve our goal. Initially, we found 787 potential studies. After implementing selecting criteria (inclusion/exclusion), we chose 98 primary studies as relevant. e extracted data lead us to the following conclusions: RQ1: approximately, 65% of the selected studies reported sentiment analysis (SA i ) approach and 24% of the selected studies reported SA j , which concludes that 89% of the selected studies implemented sentiment analysis in total (SA i + SA j ), followed by a volume-based approach, 26% of the selected studies in total (Vol i + Vol j ). 6% of the selected studies employed social network analysis techniques (SNA i + SNA j ). RQ1(a): 51% of the selected studies used supervised learning in total (S i + S j ), which makes it the highest used technique (52%) in this SMS. Lexicon-based approach makes 39% (LA i + LA j ). 18% employed volumetric techniques (C i + C j ). Only 9% employed unsupervised learning techniques (US i + US j ). Furthermore, 5% of the selected studies implemented deep learning (DL i + DL j ) techniques RQ1(b): this SMS listed nearly all the tools used in the primary selected studies. NLTK is used most commonly. 13% of the selected studies reported GUI tools such as WEKA and RapidMiner. Almost 18 types of dictionaries are used in the primary studies. RQ1(c): almost 12% used Tweepy, 7% employed TwitterR, 5% Twitter REST API, 12% Search API, 9% Streaming API, and 20% of the selected studies just mentioned Twitter API. RQ2: 44% of the selected studies manually or automatically annotated the data.  Demographic data show that 76% of the selected studies are conference papers, and 24% are Journal papers. Predicting elections on Twitter is getting more popular and attracting more researchers in the last decade. 284 researchers contributed in the primary selected 98 papers out of which 21 authors have support level more than 2. e authors who appeared in the selected studies were affiliated to 158 organizations. 13 organizations have contributed to more than 2 studies, out of which two organizations have support level 3. e results highlighted that 149 are academic organizations, and only 7 industrial affiliations have appeared. Furthermore, 9 venues are the most active, out of which 7 are conferences.
As future work, we recommend that (i) ere is a need for in-depth analysis in the field of prediction election on Twitter (ii) Empirical studies need to be conducted; election prediction (iii) Analyze elections predictions on platforms other than Twitter (iv) Analyze and compare election predictions in crossfields, such as computer science and social sciences

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.