ABSTRACT
This publication describes the motivation and generation of Qbias, a large dataset of Google and Bing search queries, a scraping tool and dataset for biased news articles, as well as language models for the investigation of bias in online search. Web search engines are a major factor and trusted source in information search, especially in the political domain. However, biased information can influence opinion formation and lead to biased opinions. To interact with search engines, users formulate search queries and interact with search query suggestions provided by the search engines. A lack of datasets on search queries inhibits research on the subject. We use Qbias to evaluate different approaches to fine-tuning transformer-based language models with the goal of producing models capable of biasing text with left and right political stance. Additionally to this work we provided datasets and language models for biasing texts that allow further research on bias in online information search.
- [1] AllSides. 2021. How AllSides Creates Balanced News: A Step-by-Step Guide. Retrieved Nov 30, 2022 from https://www.allsides.com/blog/how-does-allsides-create-balanced-newsGoogle Scholar
- [2] AllSides. 2022. Balanced News Headlines Roundup. Retrieved Nov 30, 2022 from https://www.allsides.com/unbiased-balanced-newsGoogle Scholar
- [3] Jing Bai, Rui Cao, Wen Ma, and Hiroyuki Shinnou. 2020. Construction of Domain-Specific DistilBERT Model by Using Fine-Tuning. In TAAI. 237–241.Google Scholar
- [4] Ramy Baly, Giovanni Da San Martino, James Glass, and Preslav Nakov. 2020. We Can Detect Your Bias: Predicting the Political Ideology of News Articles. In EMNLP. 4982–4991.Google Scholar
- [5] Michael Barbaro and Tom Zeller. 2006. A Face is exposed for AOL searcher no. 4417749. New York Times (01 2006).Google Scholar
- [6] Nicholas Belkin, Colleen Cool, Diane Kelly, S.-J Lin, S.Y Park, Jose Perez-carballo, and Cynthia Sikora. 2001. Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval. IPM 37 (05 2001), 403–434.Google Scholar
- [7] Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In NIPS. 4356–4364.Google Scholar
- [8] Malte Bonart, Anastasiia Samokhina, Gernot Heisenberg, and Philipp Schaer. 2019. An investigation of biases in web search engine query suggestions. OIR 44, 2 (2019), 365–381.Google ScholarCross Ref
- [9] Berfu Büyüköz, Ali Hürriyetoǧlu, and Arzucan Özgür. 2020. Analyzing ELMo and DistilBERT on Socio-political News Classification. In AESPEN.Google Scholar
- [10] Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. FNTIR 10, 4 (2016), 273–363.Google Scholar
- [11] Wei-Fan Chen, Khalid Al Khatib, Henning Wachsmuth, and Benno Stein. 2020. Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity. In NLPCSS. 149–154. https://doi.org/10.18653/v1/2020.nlpcss-1.16Google ScholarCross Ref
- [12] Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas 20, 1 (1960), 37–46.Google ScholarCross Ref
- [13] Dave D’Alessio and Mike Allen. 2000. Media Bias in Presidential Elections: A Meta-Analysis. J. Commun. 50, 4 (2000), 133–156.Google Scholar
- [14] Edelman. 2022. 2022 Edelman Trust Barometer. Retrieved Nov 30, 2022 from https://www.edelman.com/trust/2022-trust-barometerGoogle Scholar
- [15] Robert Epstein and Ronald E. Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. PNAS 112, 33, E4512–E4521. Publisher: National Academy of Sciences Section: PNAS Plus.Google ScholarCross Ref
- [16] Matthew Gault. 2022. AI Trained on 4Chan Becomes ‘Hate Speech Machine’. Retrieved Feb 28, 2023 from https://www.vice.com/en/article/7k8zwx/ai-trained-on-4chan-becomes-hate-speech-machineGoogle Scholar
- [17] Bertram Gawronski. 2021. Partisan bias in the identification of fake news. TiCS 25, 9 (2021), 723–724.Google Scholar
- [18] Fabian Haak and Philipp Schaer. 2021. Perception-Aware Bias Detection for Query Suggestions. In BIAS. 130–142.Google Scholar
- [19] Fabian Haak and Philipp Schaer. 2022. Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation. In WebSci. 219–227.Google Scholar
- [20] Daniel Hienert, Philipp Schaer, Johann Schaible, and Philipp Mayr. 2011. A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries.. In TPDL(Lecture Notes in Computer Science, Vol. 6966), Stefan Gradmann, Francesca Borri, Carlo Meghini, and Heiko Schuldt (Eds.). Springer, 192–203. http://dblp.uni-trier.de/db/conf/ercimdl/tpdl2011.html#HienertSSM11Google ScholarCross Ref
- [21] Christoph Hube and Besnik Fetahu. 2019. Neural Based Statement Classification for Biased Language. In WSDM. ACM.Google Scholar
- [22] L. Introna and H. Nissenbaum. 2000. Defining the Web: the politics of search engines. Computer 33, 1, 54–62.Google ScholarDigital Library
- [23] Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. 2019. Search bias quantification: investigating political bias in social media and web search. Inf. Retr. J. 22, 1, 188–227.Google ScholarDigital Library
- [24] Ruibo Liu, Chenyan Jia, and Soroush Vosoughi. 2021. A Transformer-based Framework for Neutralizing and Reversing the Political Polarity of News Articles. Proc. ACM Hum.-Comput. Interact. 5, 1–26.Google ScholarDigital Library
- [25] Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR. 1055–1058.Google Scholar
- [26] Negar Mokhberian, André s Abeliuk, Patrick Cummings, and Kristina Lerman. 2020. Moral Framing and Ideological Bias of News. In SocInfo. 206–219.Google Scholar
- [27] Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow. 2021. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines. In ICLR. https://openreview.net/forum?id=nzpLWnVAyahGoogle Scholar
- [28] Xi Niu and Diane Kelly. 2014. The use of query suggestions during information search. IPM 50, 1, 218–234.Google Scholar
- [29] Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A Picture of Search. In InfoScale. 1–es.Google Scholar
- [30] Lily Ray. 2020. 2020 Google Search Survey: How Much Do Users Trust Their Search Results?Retrieved Nov 30, 2022 from https://moz.com/blog/2020-google-search-surveyGoogle Scholar
- [31] Shaina Raza, Deepak John Reji, and Chen Ding. 2022. Dbias: Detecting biases and ensuring Fairness in news articles. Int J Data Sci Anal (2022).Google Scholar
- [32] Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic Models for Analyzing and Detecting Biased Language. In ACL. 1650–1659.Google Scholar
- [33] Reddit. 2022. This is the worst AI ever. Retrieved Feb 28, 2023 from https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/Google Scholar
- [34] Ronald E. Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson. 2018. Auditing Partisan Audience Bias within Google Search. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 148 (2018).Google ScholarDigital Library
- [35] Ronald E. Robertson, Shan Jiang, David Lazer, and Christo Wilson. 2019. Auditing Autocomplete: Suggestion Networks and Recursive Algorithm Interrogation. In WebSci. 235–244.Google ScholarDigital Library
- [36] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019).Google Scholar
- [37] Danny Sullivan. 2018. How Google autocomplete works in Search. Retrieved Nov 30, 2022 from https://blog.google/products/search/how-google-autocomplete-works-search/Google Scholar
- [38] Peng Wang, Xianghang mi, Xiaojing Liao, Xiaofeng Wang, Kan Yuan, Feng Qian, and Raheem Beyah. 2018. Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations. In NDSS.Google Scholar
- [39] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In EMNLP. 38–45.Google Scholar
- [40] Jinxi Xu and W. Bruce Croft. 2000. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Trans. Inf. Syst. 18, 1 (Jan. 2000), 79–112. https://doi.org/10.1145/333135.333138Google ScholarDigital Library
Index Terms
- Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions
Recommendations
Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022Despite their important role in online information search, search query suggestions have not been researched as much as most other aspects of search engines. Although reasons for this are multi-faceted, the sparseness of context and the limited data ...
Mining Web search engines for query suggestion
Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Comments