On the use of text augmentation for stance and fake news detection

ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.


Introduction
In the era of the Internet and social media, where a myriad of information of various types is instantly available and where any point of view can find an audience, access to information is no longer an issue, and the key challenges are veracity, credibility, and authenticity. The reason for this is that any user can readily gather, consume, and break news, without verification, fact-checking, or third-party filtering.
By directly influencing public opinions, major political events, and societal debates, fake news has become the scourge of the digital era, and combating it has become a dire need. The identification of fake news is however very challenging, not only from a machine learning and Natural Language Processing (NLP) perspective, but also sometimes for the most experienced journalists (Pomerleau & Rao, 2017). That is why the scientific community approaches the task from a variety of angles and often breaks down the process into independent sub-tasks. A first practical step towards automatic factchecking and fake news detection is to estimate the opinion or the point of view (i.e. stance) of different news sources regarding the same topic or claim (Pomerleau & Rao, 2017). This (sub-) task, addressed in recent research as stance detection, was popularized by the Fake News Challenge -Stage 1 (or FNC-1) (Pomerleau & Rao, 2017), which compares article bodies to article headlines and determines if a body agrees, disagrees, discusses or is unrelated to the claim of a headline. As aptly stated by Momchil et al. (2022), automated stance detection can help in identifying fake news in two key ways. First, it enables human fact-checkers to quickly and efficiently, identify controversial claims, gather relevant opinions about a claim, and evaluate the arguments for and against it (i.e. evidence retrieval). Second, it can be integrated as a component of an automated fact-checking pipeline, which would give a preliminary label to a claim, based on the stances taken by various sources, weighted by their credibility (Guo et al., 2021).
In a previous work (Salah et al., 2022), we proposed a novel Augmentation-based Ensemble learning approach for stance and fake news detection. Data augmentation aims at synthesizing new training instances that have the same ground-truth labels as the instances that they originate from (Xie et al., 2019). Data augmentation has several well-known benefits: (i) preventing overfitting by improving the diversity of training data; (ii) preventing data scarcity by providing a relatively easy and inexpensive way to collect and label data; (iii) increasing the generalization ability of the obtained models; and (iv) helping resolve class imbalance issues. Data augmentation is extensively used in Computer Vision (CV) where it is considered as one of the anchors of good predictive performance. Despite promising advances, data augmentation remains however less explored in NLP where it is still considered as the 'cherry on the cake' which provides a steady but limited performance boost (Shorten et al., 2021).
Ensemble learning combines the knowledge acquired by base learners to make a consensus decision which is supposed to be superior to the one attained by each base learner alone (Suting & Ning, 2020). Research on ensemble learning proves that the greater are the skills and the diversity of base learners, the better are the accuracy and the generalization ability of the ensemble (Suting & Ning, 2020). In our work we leverage text augmentation to enhance both, the diversity and the skills of base learners, ergo the predictive performance of the ensemble.
Class imbalance refers to situations where the distribution of examples across the classes is not equal, i.e. the number of examples available for one or more classes (minority classes) is far less than other classes (majority classes). Class imbalance appears in many domains, including fraud detection, disease screening and fake news detection. When a dataset is imbalanced, most classifiers have a strong bias toward majority classes (Fernndez et al., 2018). This paper extends our previous work (Salah et al., 2022) by presenting more comprehensive experimental results, additional related work, and deeper insights into our augmentation-based ensemble approach. Furthermore, this paper evaluates the effectiveness of text augmentation in addressing severe and moderate class imbalance, a common issue in stance and fake news detection, not explored in (Salah et al., 2022).
The main contributions of our work are therefore: (i) an extensive experimental study on the effect of different text data augmentation techniques on the performance of common classification algorithms; (ii) a novel augmentation-based ensemble learning approach; and (iii) an experimental study on the use of text augmentation to mitigate the effects of class imbalance.
The remainder of this paper is organized as follows. Section 2 outlines the main steps we followed to vectorize text and reduce dimensionality. Section 3 exposes the key motifs of data augmentation and the text augmentation techniques adopted in our work. Section 4 details the architecture of our novel augmentation-based ensemble learning. Section 5 briefly reviews existing work on stance and fake news detection. Section 6 presents an experimental study on two real-world fake news datasets and discusses the main results and findings. Finally, Section 7 concludes the paper.

Pre-processing and feature extraction
Machine Learning (ML) algorithms operate on numerical features, expecting input in the form of a matrix where rows represent instances and columns features. Raw texts have therefore to be transformed into feature vectors before feeding into ML algorithms (Jouini et al., 2021). In our work, we first eliminated stop words and reduced words to their roots (i.e. base words) by stemming them using Snowball Stemmer from the NLTK library (NLTK.org., n.d.). We next vectorized the corpus with a TF-IDF (Term Frequency -Inverse Document Frequency) weighting scheme and generated a term-document matrix.
TF-IDF is computed on a per-term basis, such that the relevance of a term to a text is measured by the scaled frequency of the appearance of the term in the text, normalized by the inverse of the scaled frequency of the term in the entire corpus. Despite its simplicity and its wide-spread use, the TF-IDF scheme has two severe limitations: (i) TF-IDF does not capture the co-occurrence of terms in the corpus and makes no use of semantic similarities between words. Accordingly, TF-IDF fails to capture some basic linguistic notions such as synonymy and homonymy; and (ii) The term-document matrix is high dimensional and is often noisy, redundant, and excessively sparse. The matrix is thus subject to the curse of dimensionality: as the number of features is large, poor generalization is to be expected.

Dimensionality reduction
Latent Semantic Analysis (LSA) (Deerwester et al., 1990) is an unsupervised statistical topic modelling technique, overcoming some of the limitations of TF-IDF. As other topic modelling techniques, such as LDA (Latent Dirichlet Allocation (Blei et al., 2003)), LSA is based on the assumptions that: (i) each text consists of a mixture of topics; and (ii) each topic consists of a set of (weighted) terms that regularly co-occur together. Put differently, the basic assumption behind LSA is that words that are close in meaning, appear in similar contexts and form a 'hidden topic'. The idea behind LSA is then to represent words that form a topic not as separate dimensions, but by a single dimension. LSA represents thus texts by 'semantic' or 'topic' vectors, based on the words that these texts contain and the set of weighted words that form each of the topics.
To uncover the latent topics that shapes the meaning of texts, LSA performs a Singular Value Decomposition (SVD) on the document-term matrix (i.e. decomposes it into a separate text-topic matrix and a topic-term matrix). Formally, SVD decomposes the termdocument matrix A t×n , with t the number terms and d the number of documents, into the product of three different matrices: orthogonal column matrix, orthogonal row matrix and one singular matrix.
where n = min (t, d) is the rank of A. By restricting the matrices T, S and D to their first k<n rows, we obtain the matrices T t×k , S k×k and D d×k , and hence obtain k-dimensional text vectors. From a practical perspective, the key ask is to determine k, which would be reasonable for the problem (i.e. without major loss). In our work we used the transformer TruncatedSVD from sklearn (Pedregosa et al., 2011). As in Li et al. (2019) we set the value of k to 100D. The experimental study conducted in Li et al. (2019) showed that using LSA (with k set to 100D) instead of TF-IDF allows a substantial performance improvement for the tasks of stance and fake news detection.

Text data augmentation
The success of data augmentation in Computer Vision has been fuelled by the ease of designing semantically invariant transformations (i.e. label-preserving transformations), such as rotation, flipping, etc…While recent years witnessed significant advancements in the design of transformation techniques, text augmentation remains less explored and adopted in NLP than in CV. This is mainly due to the intrinsic properties of textual data (e.g. polysemy), which make defining label-preserving transformations much harder (Shorten et al., 2021). In the sequel we mainly focus on off-the-shelf text augmentation techniques and less on techniques that are still in the research phase, waiting for largescale testing and adoption. For a more exhaustive survey on text augmentation techniques, we refer the reader to Karnyoto et al. (2022), Li et al. (2021), andTesfagergish et al. (2021).

Masked language models
The main idea behind Masked Language Models (MLMs), such as BERT (Devlin et al., 2018), is to mask words in sentences and let the model predict the masked words. BERT, which is a pretrained multi-layer bidirectional transformer encoder, has the ability to predict masked words based on the bidirectional context (i.e. based on its left and right surrounding words). In contrast with other context-free models such as GLOVE and Word2Vec, BERT alleviates the problem of ambiguity since it considers the whole context of a word.
BERT is considered as a breakthrough in the use of ML for NLP and is widely used in a variety of tasks such as classification, Question/Answering, and Named Entity Recognition (Shi et al., 2020). Inspired by the recent work of Li et al. (2021) and Shi et al. (2020), we use BERT as an augmentation technique. The idea is to generate new sentences by randomly masking words and replacing them by those predicted by BERT.

Back-translation (a.k.a. round-trip translation)
Back-Translation is the process of translating a text into another language, then translating the new text back into the original language. Back-translation is one of the most popular means of paraphrasing and text augmentation (Marivate & Sefara, 2019). Google Cloud Translation API, used in our work to translate sentences to French and back, is considered as the most common tool for back-translation .

Synonym (a.k.a. thesaurus-based augmentation)
The Synonym technique, also called lexical substitution with dictionary, was until recently the most widely (and for a long time the only) augmentation technique used for textual data classification. As suggested by its name, the Synonym technique replaces randomly selected words with their respective synonyms. The types of words that are candidates for lexical substitution are: adverbs, adjectives, nouns and verbs.
The synonyms are typically taken from a lexical database (i.e. dictionary of synonyms). WordNet (Shoemaker, 2019), used in our work for synonym replacement, is considered as the most popular open-source lexical database for the English language .

TF-IDF based insertion and substitution
The intuition behind these two noising-based techniques is that uninformative words (i.e. words having low TF-IDF scores) should have no or little impact on classification. Therefore, the insertion of words having low TF-IDF scores (at random positions) should preserve the label associated with a text, even if the semantics are not preserved. An alternate strategy is to replace randomly selected words with words having the same low TF-IDF scores (TF-IDF based substitution).

Diversity and skilfulness in ensemble learning
Ensemble Learning finds its origins in the 'Wisdom of Crowds' theory (Surowiecki, 2005). The 'Wisdom of Crowds' theory states that the collective opinion of a group of individuals can be better than the opinion of a single expert, provided that the aggregated opinions are diverse (i.e. diversity of opinion) and that each individual in the group has a minimum level of competence (e.g. better than a random guess). Similarly, Ensemble Learning combines the knowledge acquired by a group of base learners to make a consensus decision which is supposed to be superior to the one reached by each of them separately (Suting & Ning, 2020). Research on Ensemble Learning proves that the greater are the skills and the diversity of base models, the better is the generalization ability of the ensemble model (Suting & Ning, 2020). Alternatively stated, to generate a good ensemble model, it is necessary to build base models that are, not only skilful, but also skilful in a different way from one another.
Bagging and stacking are among the main classes of parallel ensemble techniques. Bagging (i.e. Bootstrap aggregating) involves training multiple instances of the same classification algorithm, then combining the predictions of the obtained models through hard or soft voting. To promote diversity, base learners are trained on different subsets of the original training set. Each subset is typically obtained by drawing random samples with replacement from the original training set (i.e. bootstrap samples). Stacking (a.k.a. stacked generalization) involves training a learning algorithm (i.e. meta-classifier) to combine the predictions of several heterogeneous learning algorithms, trained on the same training data. The most common approach to train the metamodel is via k-fold cross-validation. With the k-fold cross-validation, the whole training dataset is randomly split (without replacement) into independent equal-sized k-folds. k −1 folds are then used to train each of the base models and the kth fold (holdout fold) is used to collect the predictions of base models on unseen data. The predictions made by base models on the holdout fold, along with the expected class labels, provide the input and the output pairs used to train the meta-model. This procedure is repeated k times. Each time a different fold acts as the holdout fold while the remaining folds are combined and used for training the base models.

Novel augmentation-based approach
As mentioned earlier, in conventional stacking base learners are trained on the same dataset and diversity is achieved by using heterogeneous classification algorithms. As depicted in Figure 1, the classical approach for combining augmentation and stacking, is to: (i) apply one or several augmentation techniques to the original dataset, (ii) fuse the original dataset and data obtained through augmentation; and (iii) train base learners on the fused dataset. In our work we adopt a different approach and train heterogeneous algorithms on different data to further promote diversity. More specifically, through an extensive experimental study (Section 6), we first identify the most accurate (augmentation technique, classification algorithm) pairs. Our meta-model is then trained on the predictions made by the most accurate pairs, using a stratified k-fold cross-validation. Figure 2 depicts the overall architecture of the proposed augmentation-based ensemble learning.
Our augmentation-based ensemble learning approach, can be seen as a mixture between stacking and bagging. In contrast with Bagging and like Stacking, we use an ensemble of heterogeneous learning algorithms. In contrast with stacking and like Bagging, base learners are trained on different datasets. However, unlike Bagging the considered datasets are not obtained through bootstrap sampling. Instead, they are obtained by combining the original training data with the data obtained by applying one of the aforementioned text augmentation techniques. Finally, like in conventional Stacking, the meta-model is trained using a stratified K-fold cross-validation.

Related work
Salient stance and fake news detection approaches adopt a wide range of different features (e.g. context-based, content-based), classifiers, and learning tactics (e.g. stacking, bagging, etc.). Due to the lack of space, we mainly focus hereafter on ensemble approaches and on approaches that rely on content-based features. We suggest readers to refer to surveys and retrospectives on recent challenges (Hanselowski et al., 2018;Khan et al., 2021) for a more comprehensive overview of the current state of research. In the sequel, we distinguish between approaches dedicated to stance classification (multinomial classification) and those intended to fake news classification (binary classification).

Stance classification
The authors of the fake news challenge (FNC-1) (Slovikovskaya, 2019), released a simple baseline model for the stance detection task. The proposed model achieves an F1-score of 79.53% and uses a gradient boosting (GradBoost) classifier on global co-occurrence, polarity and refutation features. The three best performing systems in the FNC-1 competition were 'SOLAT in the SWEN' (Pan, 2018), 'Team Athene' (Hanselowski et al., 2018) and 'UCL Machine Reading' (UCLMR) (Riedel et al., 2017). 'SOLAT in the SWEN' won the competition using an ensemble approach based on a 50/50 weighted average between gradient-boosted decision trees and a Convolutional Neural Network (CNN). The proposed system is based on several features: Word2Vec pretrained embeddings, TF-IDF, Single Value Decomposition and WordCount. The convolutional network uses pre-trained Word2Vec embeddings passed through several convolutional layers followed by three fully-connected layers and a final softmax layer for classification. Hanselowski et al. (2018), the second place winner, used an ensemble composed of 5 Multi-Layer Perceptrons (MLPs), where labels are predicted through hard voting. The system of UCLMR (Riedel et al., 2017), placed third, used an MLP classifier with one hidden layer of 100 units and a softmax layer for classification.
Recently, other published work used FNC-1 in their experiments.In particular, several recent approaches (Dulhanty et al., 2019;Sepúlveda-Torres et al., 2021;Slovikovskaya, 2019) construct stance detection language models by performing transfer learning on pre-trained variants of BERT (mainly BERT, RoBERTa and XLNet).
Among these approaches, Sepúlveda-Torres et al. (2021) stands out for its integration of text summarization. Text summarization involves reducing a long text into a shorter version, while preserving its most important information. From an overarching perspective, text summarization and text augmentation can both be seen as techniques that alter the form of a text (making it more concise or more diverse) to better capture its core meaning.
The approach proposed by Sepúlveda-Torres et al. (2021) involves two stages: Relatedness Stage and Stance Stage. The Relatedness Stage is in charge of determining whether or not a headline and a news summary are related. This stage uses TextRank extractive algorithm for body text summarization and a fine-tuned RoBERTa pre-trained model that classifies a headline-summary pair as related or unrelated. Once the related pairs are identified, the Stance stage determines their type with respect to the remaining stances (agree, disagree or discuss). Similarly to the Relatedness Stage, the Stance stage uses a fine-tuned RoBERTa pre-trained model to yield predictions. While not presented as such by the authors, we believe that by discarding unrelated pairs (the majority class) at an early stage of the process, the approach of Sepúlveda-Torres et al. (2021) 2021) is that it is not suitable for short text messages, which are prevalent on social media.

Fake news classification
Besides stance detection, several ensemble learning models have been proposed to tackle the binary (True News/False News) content-based classification task. Notably, Jiang et al. (2021) proposed a stacking-based ensemble that uses Random Forest (RF) as meta-learner and Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), k-nearest neighbours (KNN), Random Forest (RF), Convolution Neural Network (CNN), Long short-term memory (LSTM) and Gated Recurrent Network (GRU) as base learners. The approach of Jiang et al. (2021) uses three different text vectorization methods: Word2Vec embedding, TF-IDF and TF. The proposed approach was evaluated on ISOT fake news (Ahmed et al., 2017) and KDnuggets (McIntire, 2017) datasets. Similarly, Patil (2022) proposed a majority voting ensemble model involving nine base learners, namely: SVM, DT, LR, RF, X-Gradient Boosting (XGBoost), Extra Trees (ET), AdaBoost, Stochastic Gradient Descent (SGD) and Naive Bayes (NB). The proposed approach was evaluated on Kaggle Fake News dataset (Kaggle.com, n.d.). In the same vein as Patil (2022)  It is worth noticing that in all the aforementioned studies, ensemble approaches yielded better results than those attained by their contributing base learners. On the other hand, despite the substantial potential improvement that text augmentation can carry out, to the best of our knowledge, there exists no previous work on stance and fake news detection that compares text augmentation techniques, uses text augmentation in conjunction with ensemble learning or mitigates the effects of class imbalance through text augmentation.

Tools & datasets
Our system was implemented using NLTK (NLTK.org., n.d.) for text preprocessing, nlpaug (Ma, 2019) for text augmentation, SciKit-Learn (version 0.24.2) (Pedregosa et al., 2011) for classification and Beautiful Soup for web scraping. A stratified 10-fold cross-validation was used for model fusion. The Li & al. approach was implemented as described in Li et al. (2019). The experimental study was conducted without any special tuning. A large number of experiments have been performed to show the accuracy and the effectiveness of our augmentation-based ensemble learning. Due to the lack of space, only few results are presented herein.
As there are no agreed-upon benchmark datasets for stance and fake news detection (Li et al., 2019), we used two publicly available and complementary datasets: FNC-1 (Slovikovskaya, 2019) and FNN (i.e. FakeNewsNet) (Shu, 2019). FNC was released to explore the task of stance detection in the context of fake news detection. As reported in Table 1, the FNC-1 dataset consists of approximately 50k headline-article pairs in the training set and 25k pairs in the test set. Stance detection is a multinomial classification problem, where the relative stance of each headline-article pair has to be classified as either: Agree if the article agrees with the headline claim, Disagree if the article disagrees with the claim, Discuss if the article is related to the claim, but takes no position on the subject, and Unrelated if the content of the article is unrelated to the claim. An illustration of the above classification task is given in Figure 3.
It is worth mentioning that the discovery of a disagreeing headline-article pair does not necessarily correspond to the discovery of a fake article, but is an automated first step which could make human fact-checkers aware of a discrepancy (Momchil et al., 2022). Human fact-checkers or specialized algorithms can then ultimately decide which articles are fake, based on the credibility of agreeing and disagreeing sources.

Results and discussion
We ran our experiments with four objectives in mind: (i) identify the best performing (Augmentation technique, Classifier) pairs; (ii) quantify the actual performance improvement allowed by each text augmentation technique; (iii) evaluate the effectiveness of our augmentation-based ensemble approach; and (iv) evaluate the effectiveness of text augmentation in addressing class imbalance.    Figures 4 and 6), report the F1-scores and Accuracy obtained on FNC (resp. FNN). The results presented in these tables allow to draw important conclusions regarding text augmentation: (1) Text augmentation does not always improve predictive performance. This can be especially observed for SVM, LightGBM, GradBoost (Tables 3 and 5) and AdaBoost    (Tables 4 and 6), where the F1-scores and Accuracy on the original dataset are higher than to those obtained on the augmented datasets; (2) There is no one-size-fits-all augmentation technique that performs well in all situations.
As shown in Tables 3 and 4 (resp. Figures 5 and 6), an augmentation technique may perform well when combined with a classification algorithm and poorly when combined with another. This is the case for example for the 'Synonym' technique which yields the highest F1-score when combined with AdaBoost and the lowest score when used with Naive Bayes (Table 3). It is worth noting that even if BERT doesn't achieve the highest F1-scores, it provides a steady performance improvement for almost all classifiers; (3) The motto 'the more, the better' is the wrong approach regarding text augmentation and targeted approaches allow often better results. This can be observed in Tables 3 and 4 (resp. Figures 5 and 6), where in almost all cases, combining all augmentation techniques does not yield the best F1-scores and Accuracy.
As shown in Tables 3 and 5, the pairs (Back-translation, Bagged RF) and (Back-translation, RF) yield the best performance on FNC and increase substantially the predictive performances (≈ +4.16% in comparison with the highest F1-Score that can be achieved without text augmentation). Similarly, as shown in Tables 4 and 6, the pairs (Substitution TF-IDF, RF) and (Insertion TF-IDF, Bagged RF) yield the best performance on the dataset FNN (≈ +5.87%).

Augmentation-based ensemble learning
As previously stated, base learners' diversity and competency are the two key success factors of any ensemble learning approach. Our ensemble approach leverages text augmentation to enhance both. Figure 2 depicts our classification model which is a mixture of stacking and bagging. In our model, we use Bagged RF and Random Forest (RF) as base classifiers and GradBoost as meta-classifier. As depicted in Figure 2, each of the base classifiers is trained on a dataset composed of the original dataset and the data obtained by applying one of the augmentation techniques. The choice of the (classifier, augmentation technique) pairs was driven by the experimental study conducted in Subsection 6.2.1. We compare our model to a more classical stacking approach, where all base classifiers are trained on the same dataset, consisting of the original dataset and the data obtained by applying one of the augmentation techniques (Figure 1). We also compare our model to the approach of Li et al. (2019), which is one of the state-of-the-art approaches that uses LSA, stacking-based ensemble learning and K-fold cross-validation. Table 7 synthesizes the predictive performances achieved by each approach.
As reported in Table 7, the use of text augmentation allows better performances than those achieved by Li et al. (2019) in almost all situations. On the other hand, except for the Synonym technique over the FNC dataset, our model outperforms the classical approach in all situations. Overall, our stacking approach achieves an increase in F1-score and Accuracy of 7,72% and 7,13% (resp. 7,54% and 2,88%) over FNC (resp. FNN) when compared to Li et al. (2019).

Class balancing
The class imbalance problem arises when data is distributed unevenly among classes; i.e. when one or more of the predicted outputs happen much less frequently than others. As stated in Fernndez et al. (2018), when working with an imbalanced classification problem: . The minority classes are typically of the most interest, meaning that a model's skill in correctly predicting a minority class is more important than in correctly predicting a majority class. . Minority classes are harder to predict. The main reason is that with few available training examples, it is often challenging to identify regularities and learn characteristics. That is why most classification algorithms tend to be biased towards the majority class(es), causing bad classification of the minority class(es).
The above two observations hold for the tasks of stance and fake news detection. As stated by the authors of the FNC challenge (Pomerleau & Rao, 2017), 'The related/unrelated classification task is expected to be much easier and is less relevant for detecting fake news…. The Stance Detection task (classify as agrees, disagrees or discuss) is both more difficult and more relevant to fake news detection, …'.
When dealing with class imbalance, evaluation metrics that give equal importance to each observation (and not to each class), such as the Accuracy, can be misleading as they Table 7. F1-score and accuracy achieved by conventional stacking, Li et al. (2019) Tables 8 and 9 where all algorithms perform poorly over the minority classes Fake News (Table 8) and Agree/Diagree (Table 9). Some algorithms, such as SVM, Logistic Regression and Naïve Baies even obtain zero F1-scores over the minority classes Agree/Disagree.
To balance the FNN dataset, 6765 additional samples was generated for the minority class Fake News, using the five augmentation techniques of section 3. Similarly, we gen-  Tables 8 and 9, text augmentation allows a substantial improvement in the F1-scores obtained over the minority classes: an average improvement of 94.47% for the Fake News class (Table 8), 189.14% for the Agree class and 586.39% for the Disagree class (Table 9).

Conclusion
Combating fake news on social media is a pressing need and a daunting task. Most of existing approaches on fake news detection, focus on using various features to identify those allowing the best predictive performance. Such approaches tend to undermine the generalization ability of the obtained models.
In this work, we investigated the use of text augmentation in the context of stance and fake news detection. In the first part of our work, we studied the effect of text augmentation on the performance of various classification algorithms. Our experimental study quantified the actual contribution of data augmentation and identified the best performing (classifier, augmentation technique) pairs. Besides, our study revealed that the motto 'the more, the better' is the wrong approach regarding text augmentation and that there is no one-size-fits-all augmentation technique. In the second part of our work, we proposed a novel augmentation-based ensemble learning approach. The proposed approach is a mixture of bagging and stacking and leverages text augmentation to enhance the diversity and the performance of base classifiers. We evaluated our approach using two real-world datasets. Experimental results show that the proposed approach is more accurate than state-of-art methods. In the third part of our work we investigated the use of text augmentation to cope with class imbalance, a very common problem in stance and fake news detection. As shown by our experimental study, even in presence of severe imbalance, text augmentation can highly alleviate its effects and substantially improve the predictive performance over the minority classes.
As a part of our future work, we intend to explore the use of a multimodal data augmentation that involves linguistic and extra linguistic features.We also intend to explore the detection of fake news from streams under concept drifts and to connect the dots between stance detection and fake news detection through the use of media profiling and multi-source credibility scores.

Disclosure statement
No potential conflict of interest was reported by the author(s).