skip to main content
10.1145/3578503.3583628acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper

Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions

Published:30 April 2023Publication History

ABSTRACT

This publication describes the motivation and generation of Qbias, a large dataset of Google and Bing search queries, a scraping tool and dataset for biased news articles, as well as language models for the investigation of bias in online search. Web search engines are a major factor and trusted source in information search, especially in the political domain. However, biased information can influence opinion formation and lead to biased opinions. To interact with search engines, users formulate search queries and interact with search query suggestions provided by the search engines. A lack of datasets on search queries inhibits research on the subject. We use Qbias to evaluate different approaches to fine-tuning transformer-based language models with the goal of producing models capable of biasing text with left and right political stance. Additionally to this work we provided datasets and language models for biasing texts that allow further research on bias in online information search.

References

  1. [1] AllSides. 2021. How AllSides Creates Balanced News: A Step-by-Step Guide. Retrieved Nov 30, 2022 from https://www.allsides.com/blog/how-does-allsides-create-balanced-newsGoogle ScholarGoogle Scholar
  2. [2] AllSides. 2022. Balanced News Headlines Roundup. Retrieved Nov 30, 2022 from https://www.allsides.com/unbiased-balanced-newsGoogle ScholarGoogle Scholar
  3. [3] Jing Bai, Rui Cao, Wen Ma, and Hiroyuki Shinnou. 2020. Construction of Domain-Specific DistilBERT Model by Using Fine-Tuning. In TAAI. 237–241.Google ScholarGoogle Scholar
  4. [4] Ramy Baly, Giovanni Da San Martino, James Glass, and Preslav Nakov. 2020. We Can Detect Your Bias: Predicting the Political Ideology of News Articles. In EMNLP. 4982–4991.Google ScholarGoogle Scholar
  5. [5] Michael Barbaro and Tom Zeller. 2006. A Face is exposed for AOL searcher no. 4417749. New York Times (01 2006).Google ScholarGoogle Scholar
  6. [6] Nicholas Belkin, Colleen Cool, Diane Kelly, S.-J Lin, S.Y Park, Jose Perez-carballo, and Cynthia Sikora. 2001. Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval. IPM 37 (05 2001), 403–434.Google ScholarGoogle Scholar
  7. [7] Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In NIPS. 4356–4364.Google ScholarGoogle Scholar
  8. [8] Malte Bonart, Anastasiia Samokhina, Gernot Heisenberg, and Philipp Schaer. 2019. An investigation of biases in web search engine query suggestions. OIR 44, 2 (2019), 365–381.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Berfu Büyüköz, Ali Hürriyetoǧlu, and Arzucan Özgür. 2020. Analyzing ELMo and DistilBERT on Socio-political News Classification. In AESPEN.Google ScholarGoogle Scholar
  10. [10] Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. FNTIR 10, 4 (2016), 273–363.Google ScholarGoogle Scholar
  11. [11] Wei-Fan Chen, Khalid Al Khatib, Henning Wachsmuth, and Benno Stein. 2020. Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity. In NLPCSS. 149–154. https://doi.org/10.18653/v1/2020.nlpcss-1.16Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas 20, 1 (1960), 37–46.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Dave D’Alessio and Mike Allen. 2000. Media Bias in Presidential Elections: A Meta-Analysis. J. Commun. 50, 4 (2000), 133–156.Google ScholarGoogle Scholar
  14. [14] Edelman. 2022. 2022 Edelman Trust Barometer. Retrieved Nov 30, 2022 from https://www.edelman.com/trust/2022-trust-barometerGoogle ScholarGoogle Scholar
  15. [15] Robert Epstein and Ronald E. Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. PNAS 112, 33, E4512–E4521. Publisher: National Academy of Sciences Section: PNAS Plus.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Matthew Gault. 2022. AI Trained on 4Chan Becomes ‘Hate Speech Machine’. Retrieved Feb 28, 2023 from https://www.vice.com/en/article/7k8zwx/ai-trained-on-4chan-becomes-hate-speech-machineGoogle ScholarGoogle Scholar
  17. [17] Bertram Gawronski. 2021. Partisan bias in the identification of fake news. TiCS 25, 9 (2021), 723–724.Google ScholarGoogle Scholar
  18. [18] Fabian Haak and Philipp Schaer. 2021. Perception-Aware Bias Detection for Query Suggestions. In BIAS. 130–142.Google ScholarGoogle Scholar
  19. [19] Fabian Haak and Philipp Schaer. 2022. Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation. In WebSci. 219–227.Google ScholarGoogle Scholar
  20. [20] Daniel Hienert, Philipp Schaer, Johann Schaible, and Philipp Mayr. 2011. A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries.. In TPDL(Lecture Notes in Computer Science, Vol. 6966), Stefan Gradmann, Francesca Borri, Carlo Meghini, and Heiko Schuldt (Eds.). Springer, 192–203. http://dblp.uni-trier.de/db/conf/ercimdl/tpdl2011.html#HienertSSM11Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Christoph Hube and Besnik Fetahu. 2019. Neural Based Statement Classification for Biased Language. In WSDM. ACM.Google ScholarGoogle Scholar
  22. [22] L. Introna and H. Nissenbaum. 2000. Defining the Web: the politics of search engines. Computer 33, 1, 54–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. 2019. Search bias quantification: investigating political bias in social media and web search. Inf. Retr. J. 22, 1, 188–227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Ruibo Liu, Chenyan Jia, and Soroush Vosoughi. 2021. A Transformer-based Framework for Neutralizing and Reversing the Political Polarity of News Articles. Proc. ACM Hum.-Comput. Interact. 5, 1–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR. 1055–1058.Google ScholarGoogle Scholar
  26. [26] Negar Mokhberian, André s Abeliuk, Patrick Cummings, and Kristina Lerman. 2020. Moral Framing and Ideological Bias of News. In SocInfo. 206–219.Google ScholarGoogle Scholar
  27. [27] Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow. 2021. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines. In ICLR. https://openreview.net/forum?id=nzpLWnVAyahGoogle ScholarGoogle Scholar
  28. [28] Xi Niu and Diane Kelly. 2014. The use of query suggestions during information search. IPM 50, 1, 218–234.Google ScholarGoogle Scholar
  29. [29] Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A Picture of Search. In InfoScale. 1–es.Google ScholarGoogle Scholar
  30. [30] Lily Ray. 2020. 2020 Google Search Survey: How Much Do Users Trust Their Search Results?Retrieved Nov 30, 2022 from https://moz.com/blog/2020-google-search-surveyGoogle ScholarGoogle Scholar
  31. [31] Shaina Raza, Deepak John Reji, and Chen Ding. 2022. Dbias: Detecting biases and ensuring Fairness in news articles. Int J Data Sci Anal (2022).Google ScholarGoogle Scholar
  32. [32] Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic Models for Analyzing and Detecting Biased Language. In ACL. 1650–1659.Google ScholarGoogle Scholar
  33. [33] Reddit. 2022. This is the worst AI ever. Retrieved Feb 28, 2023 from https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/Google ScholarGoogle Scholar
  34. [34] Ronald E. Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson. 2018. Auditing Partisan Audience Bias within Google Search. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 148 (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Ronald E. Robertson, Shan Jiang, David Lazer, and Christo Wilson. 2019. Auditing Autocomplete: Suggestion Networks and Recursive Algorithm Interrogation. In WebSci. 235–244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019).Google ScholarGoogle Scholar
  37. [37] Danny Sullivan. 2018. How Google autocomplete works in Search. Retrieved Nov 30, 2022 from https://blog.google/products/search/how-google-autocomplete-works-search/Google ScholarGoogle Scholar
  38. [38] Peng Wang, Xianghang mi, Xiaojing Liao, Xiaofeng Wang, Kan Yuan, Feng Qian, and Raheem Beyah. 2018. Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations. In NDSS.Google ScholarGoogle Scholar
  39. [39] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In EMNLP. 38–45.Google ScholarGoogle Scholar
  40. [40] Jinxi Xu and W. Bruce Croft. 2000. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Trans. Inf. Syst. 18, 1 (Jan. 2000), 79–112. https://doi.org/10.1145/333135.333138Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023
            April 2023
            373 pages
            ISBN:9798400700897
            DOI:10.1145/3578503

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 April 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate218of875submissions,25%
          • Article Metrics

            • Downloads (Last 12 months)70
            • Downloads (Last 6 weeks)5

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format