skip to main content
10.1145/3535511.3535550acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

A Comparative Study of Machine Learning Algorithms for the Detection of Fake News on the Internet

Published:30 June 2022Publication History

ABSTRACT

Context: The increase in the proliferation of fake news on the Internet has significantly impacted the quality and veracity of information received by society. Problem: The malicious use of information can compromise democracy by manipulating people’s opinions. In addition, there are few facilitating mechanisms that classify and help the citizen to know whether a certain news propagated is true or not. This problem has driven new research directions in an attempt to classify and identify these news. Methodology: This work in its methodology performs a comparison of algorithms to serve as an intelligent solution in the detection of fake news in Portuguese. About 12,000 news featured the dataset used for this analysis. Pre-processing techniques were used to analyze the patterns of these news, as well as to reduce noise and eliminate null information. The algorithms used for comparison were Logistic Regression, Stochastic Gradient Descent, Support Vector Machine and Multilayer Perceptron. Result: The results obtained showed that the models generated by the four algorithms obtained an accuracy greater than 90%. To choose the best algorithm, metrics such as precision, recall and f-measure were used for each of the models. The SVM algorithm had the best performance, with 96.39% accuracy. Contribution: In addition to the analytical results presented, this work brought as contributions the availability of a database containing news in Portuguese and an analysis, from the text of the news, both grammatical and structural, in order to detect the existing patterns between true and false.

References

  1. Davide Anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, and Sandro Ridella. 2012. The’K’in K-fold Cross Validation.. In ESANN. i6doc. com publ, Bruges, Belgium, 441–446.Google ScholarGoogle Scholar
  2. Rafael Batista. 2018. A divulgação de notícias falsas, conhecidas como fake news, pode interferir negativamente em vários setores da sociedade, como política, saúde e segurança. https://mundoeducacao.bol.uol.com.br/curiosidades/fake-news.htm. Acessado: 20/04/2021.Google ScholarGoogle Scholar
  3. Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Gaël Varoquaux. 2013. API design for machine learning software: experiences from the scikit-learn project. In ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108–122.Google ScholarGoogle Scholar
  4. Sonia Castelo, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo Nakamura, and Juliana Freire. 2019. A Topic-Agnostic Approach for Identifying Fake News Pages. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, 975–980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Douglas Ciriaco. 2018. Mais de 4 bilhões de pessoas usam a internet ao redor do mundo.https://www.tecmundo.com.br/internet/4-bilhoes-pessoas-usam-internet-no-mundo.html. Acessado: 19/04/2021.Google ScholarGoogle Scholar
  6. Rosanne D’Agostino. 2017. Três anos depois, linchamento de Fabiane após boato na web pode ajudar a endurecer lei. https://g1.globo.com/e-ou-nao-e/noticia/tres_anos_depois_linchamento_de_fabiane_apos_boato_na_web_pode_ajudar_a_endurecer_lei.ghtml. Acessado: 20/04/2021.Google ScholarGoogle Scholar
  7. Ithalo Henrique de Sousa Leal. 2018. O uso de aprendizagem de máquina para identificação e classificação de fake news no twitter referentes a eleição presidencial de 2018. Monografia (Bacharelado em Ciência da Computação), Faculdade Doctum de Caratinga.Google ScholarGoogle Scholar
  8. Caroline Delmazo and Jonas C.L. Valente. 2018. Fake news nas redes sociais online: propagação e reações à desinformação em busca de cliques. Media & Jornalismo 18 (04 2018), 155 – 169. http://www.scielo.mec.pt/scielo.php?script=sci_arttext&pid=S2183-54622018000100012&nrm=isoGoogle ScholarGoogle Scholar
  9. Davi P. Guimarães, Guilherme M. Moreira, Matheus E. Fagundes, and Nilson M. Lazarin. 2019. Análise de sites disseminadores de fake news. In Anais Estendidos do XV Simpósio Brasileiro de Sistemas de Informação (Aracaju). SBC, Porto Alegre, RS, Brasil, 17–20. https://doi.org/10.5753/sbsi.2019.7431Google ScholarGoogle Scholar
  10. Md Abu Kausar, VS Dhaka, and Sanjeev Kumar Singh. 2013. Web crawler: a review. International Journal of Computer Applications 63, 2(2013).Google ScholarGoogle Scholar
  11. Simon Kemp. 2018. Digital in 2018: World’s Internet users pass the 4 billion mark. https://wearesocial.com/blog/2018/01/global-digital-report-2018 https://wearesocial.com/blog/2018/01/global-digital-report-2018. Acessado em 19/04/2021.Google ScholarGoogle Scholar
  12. Jake Lever, Martin Krzywinski, and Naomi Altman. 2016. Logistic regression.Google ScholarGoogle Scholar
  13. Marumo and Fabiano Shiiti. 2018. Deep Learning para Classificação de fake news por sumarização de texto.Monografia (Bacharelado em Ciência da Computação), Universidade Estadual de Londrina.Google ScholarGoogle Scholar
  14. Ryan Mitchell. 2018. Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.Google ScholarGoogle Scholar
  15. Maria Carolina Monard and José Augusto Baranauskas. 2003. Conceitos Sobre Aprendizado de Máquina. In Sistemas Inteligentes Fundamentos e Aplicações (1 ed.). Manole Ltda, Barueri-SP, 89–114.Google ScholarGoogle Scholar
  16. Rafael Monteiro, Roney L. de Sales, and Thiago A. S. Pardo. 2018. Detecção Automática de Notícias Falsas para o Português.https://nilc-fakenews.herokuapp.com/about. Acessado: 11/04/2021.Google ScholarGoogle Scholar
  17. Roger Monteiro, Rodrigo Nogueira, and Greisse Moser. 2019. Desenvolvimento de um sistema para a classificação de Fakenews acoplado à etapa de ETL de um Data Warehouse de Textos de Notícias em língua Portuguesa. In Anais da XV Escola Regional de Banco de Dados(Chapecó). SBC, Porto Alegre, RS, Brasil, 131–140. https://doi.org/10.5753/erbd.2019.8486Google ScholarGoogle Scholar
  18. Rafael A. Monteiro, Roney L. S. Santos, Thiago A. S. Pardo, Tiago A. de Almeida, Evandro E. S. Ruiz, and Oto A. Vale. 2018. Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In Computational Processing of the Portuguese Language. Springer International Publishing, NY, USA, 324–334.Google ScholarGoogle Scholar
  19. Kenneth Rapoza. 2017. Can ’Fake News’ Impact The Stock Market?https://www.forbes.com/sites/kenrapoza/2017/02/26/can-fake-news-impact-the-stock-market/#6f93aa252fac. Acessado: 19/04/2021.Google ScholarGoogle Scholar
  20. Sebastian Raschka. 2018. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arxiv:1811.12808 [cs.LG]Google ScholarGoogle Scholar
  21. Christopher Salton, Gerard e Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.Google ScholarGoogle Scholar
  22. Wellison Santos, Marcus Xavier, David Carlos da Cunha, Jose Carlos Ferreira, Daniel Adauto, and Carlos Ferraz. 2019. TrendsBot: Verificando a veracidade das mensagens do Telegram utilizando Data Stream. In Anais Estendidos do XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (Gramado). SBC, Porto Alegre, RS, Brasil, 65–72. https://doi.org/10.5753/sbrc_estendido.2019.7771Google ScholarGoogle Scholar
  23. Daniel Silveira. 2018. Brasil ganha 10 milhões de internautas em 1 ano, aponta IBGE. https://g1.globo.com/economia/tecnologia/noticia/2018/12/20/numero_de_internautas_cresce_em_cerca_de_10_milhoes_em_um_ano_no_brasil_aponta_ibge.ghtml. Acessado: 20/04/2021.Google ScholarGoogle Scholar
  24. Statista. 2017. Internet usage in Brazil - Statistics & Facts. http://www.digitalnewsreport.org/survey/2018/brazil-2018/. Acessado: 21/04/2021.Google ScholarGoogle Scholar

Index Terms

  1. A Comparative Study of Machine Learning Algorithms for the Detection of Fake News on the Internet
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SBSI '22: Proceedings of the XVIII Brazilian Symposium on Information Systems
          May 2022
          394 pages

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 June 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate181of557submissions,32%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format