short-paper

Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions

Authors:
Fabian Haak

Technische Hochschule Köln, Germany

Technische Hochschule Köln, Germany

0000-0002-3392-7860
View Profile

,
Philipp Schaer

Technische Hochschule Köln, Germany

Technische Hochschule Köln, Germany

0000-0002-8817-4632
View Profile

WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023April 2023Pages 239–244https://doi.org/10.1145/3578503.3583628

Published:30 April 2023Publication History

WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

Pages 239–244

ABSTRACT

This publication describes the motivation and generation of Qbias, a large dataset of Google and Bing search queries, a scraping tool and dataset for biased news articles, as well as language models for the investigation of bias in online search. Web search engines are a major factor and trusted source in information search, especially in the political domain. However, biased information can influence opinion formation and lead to biased opinions. To interact with search engines, users formulate search queries and interact with search query suggestions provided by the search engines. A lack of datasets on search queries inhibits research on the subject. We use Qbias to evaluate different approaches to fine-tuning transformer-based language models with the goal of producing models capable of biasing text with left and right political stance. Additionally to this work we provided datasets and language models for biasing texts that allow further research on bias in online information search.

References

[1] AllSides. 2021. How AllSides Creates Balanced News: A Step-by-Step Guide. Retrieved Nov 30, 2022 from https://www.allsides.com/blog/how-does-allsides-create-balanced-newsGoogle Scholar
[2] AllSides. 2022. Balanced News Headlines Roundup. Retrieved Nov 30, 2022 from https://www.allsides.com/unbiased-balanced-newsGoogle Scholar
[3] Jing Bai, Rui Cao, Wen Ma, and Hiroyuki Shinnou. 2020. Construction of Domain-Specific DistilBERT Model by Using Fine-Tuning. In TAAI. 237–241.Google Scholar
[4] Ramy Baly, Giovanni Da San Martino, James Glass, and Preslav Nakov. 2020. We Can Detect Your Bias: Predicting the Political Ideology of News Articles. In EMNLP. 4982–4991.Google Scholar
[5] Michael Barbaro and Tom Zeller. 2006. A Face is exposed for AOL searcher no. 4417749. New York Times (01 2006).Google Scholar
[6] Nicholas Belkin, Colleen Cool, Diane Kelly, S.-J Lin, S.Y Park, Jose Perez-carballo, and Cynthia Sikora. 2001. Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval. IPM 37 (05 2001), 403–434.Google Scholar
[7] Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In NIPS. 4356–4364.Google Scholar
[8] Malte Bonart, Anastasiia Samokhina, Gernot Heisenberg, and Philipp Schaer. 2019. An investigation of biases in web search engine query suggestions. OIR 44, 2 (2019), 365–381.Google ScholarCross Ref
[9] Berfu Büyüköz, Ali Hürriyetoǧlu, and Arzucan Özgür. 2020. Analyzing ELMo and DistilBERT on Socio-political News Classification. In AESPEN.Google Scholar
[10] Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. FNTIR 10, 4 (2016), 273–363.Google Scholar
[11] Wei-Fan Chen, Khalid Al Khatib, Henning Wachsmuth, and Benno Stein. 2020. Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity. In NLPCSS. 149–154. https://doi.org/10.18653/v1/2020.nlpcss-1.16Google ScholarCross Ref
[12] Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas 20, 1 (1960), 37–46.Google ScholarCross Ref
[13] Dave D’Alessio and Mike Allen. 2000. Media Bias in Presidential Elections: A Meta-Analysis. J. Commun. 50, 4 (2000), 133–156.Google Scholar
[14] Edelman. 2022. 2022 Edelman Trust Barometer. Retrieved Nov 30, 2022 from https://www.edelman.com/trust/2022-trust-barometerGoogle Scholar
[15] Robert Epstein and Ronald E. Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. PNAS 112, 33, E4512–E4521. Publisher: National Academy of Sciences Section: PNAS Plus.Google ScholarCross Ref
[16] Matthew Gault. 2022. AI Trained on 4Chan Becomes ‘Hate Speech Machine’. Retrieved Feb 28, 2023 from https://www.vice.com/en/article/7k8zwx/ai-trained-on-4chan-becomes-hate-speech-machineGoogle Scholar
[17] Bertram Gawronski. 2021. Partisan bias in the identification of fake news. TiCS 25, 9 (2021), 723–724.Google Scholar
[18] Fabian Haak and Philipp Schaer. 2021. Perception-Aware Bias Detection for Query Suggestions. In BIAS. 130–142.Google Scholar
[19] Fabian Haak and Philipp Schaer. 2022. Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation. In WebSci. 219–227.Google Scholar
[20] Daniel Hienert, Philipp Schaer, Johann Schaible, and Philipp Mayr. 2011. A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries.. In TPDL(Lecture Notes in Computer Science, Vol. 6966), Stefan Gradmann, Francesca Borri, Carlo Meghini, and Heiko Schuldt (Eds.). Springer, 192–203. http://dblp.uni-trier.de/db/conf/ercimdl/tpdl2011.html#HienertSSM11Google ScholarCross Ref
[21] Christoph Hube and Besnik Fetahu. 2019. Neural Based Statement Classification for Biased Language. In WSDM. ACM.Google Scholar
[22] L. Introna and H. Nissenbaum. 2000. Defining the Web: the politics of search engines. Computer 33, 1, 54–62.Google ScholarDigital Library
[23] Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. 2019. Search bias quantification: investigating political bias in social media and web search. Inf. Retr. J. 22, 1, 188–227.Google ScholarDigital Library
[24] Ruibo Liu, Chenyan Jia, and Soroush Vosoughi. 2021. A Transformer-based Framework for Neutralizing and Reversing the Political Polarity of News Articles. Proc. ACM Hum.-Comput. Interact. 5, 1–26.Google ScholarDigital Library
[25] Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR. 1055–1058.Google Scholar
[26] Negar Mokhberian, André s Abeliuk, Patrick Cummings, and Kristina Lerman. 2020. Moral Framing and Ideological Bias of News. In SocInfo. 206–219.Google Scholar
[27] Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow. 2021. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines. In ICLR. https://openreview.net/forum?id=nzpLWnVAyahGoogle Scholar
[28] Xi Niu and Diane Kelly. 2014. The use of query suggestions during information search. IPM 50, 1, 218–234.Google Scholar
[29] Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A Picture of Search. In InfoScale. 1–es.Google Scholar
[30] Lily Ray. 2020. 2020 Google Search Survey: How Much Do Users Trust Their Search Results?Retrieved Nov 30, 2022 from https://moz.com/blog/2020-google-search-surveyGoogle Scholar
[31] Shaina Raza, Deepak John Reji, and Chen Ding. 2022. Dbias: Detecting biases and ensuring Fairness in news articles. Int J Data Sci Anal (2022).Google Scholar
[32] Marta Recasens, Cristian Danescu-Niculescu-Mizil, and Dan Jurafsky. 2013. Linguistic Models for Analyzing and Detecting Biased Language. In ACL. 1650–1659.Google Scholar
[33] Reddit. 2022. This is the worst AI ever. Retrieved Feb 28, 2023 from https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/Google Scholar
[34] Ronald E. Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson. 2018. Auditing Partisan Audience Bias within Google Search. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 148 (2018).Google ScholarDigital Library
[35] Ronald E. Robertson, Shan Jiang, David Lazer, and Christo Wilson. 2019. Auditing Autocomplete: Suggestion Networks and Recursive Algorithm Interrogation. In WebSci. 235–244.Google ScholarDigital Library
[36] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019).Google Scholar
[37] Danny Sullivan. 2018. How Google autocomplete works in Search. Retrieved Nov 30, 2022 from https://blog.google/products/search/how-google-autocomplete-works-search/Google Scholar
[38] Peng Wang, Xianghang mi, Xiaojing Liao, Xiaofeng Wang, Kan Yuan, Feng Qian, and Raheem Beyah. 2018. Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations. In NDSS.Google Scholar
[39] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In EMNLP. 38–45.Google Scholar
[40] Jinxi Xu and W. Bruce Croft. 2000. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Trans. Inf. Syst. 18, 1 (Jan. 2000), 79–112. https://doi.org/10.1145/333135.333138Google ScholarDigital Library

Index Terms

Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation
      2. Query suggestion
  2. World Wide Web
    1. Web searching and information discovery
      1. Web search engines

Recommendations

Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation
WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

Despite their important role in online information search, search query suggestions have not been researched as much as most other aspects of search engines. Although reasons for this are multi-faceted, the sparseness of context and the limited data ...
Read More
Investigation of Bias in Web Search Queries
Advances in Information Retrieval
Abstract
The dissertation investigates the correlations and effects between biases in search queries and search query suggestions, search results, and users’ states of knowledge. Search engines are an important factor in opinion formation, while search ...
Read More
Mining Web search engines for query suggestion

Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023
April 2023
373 pages
ISBN:9798400700897
DOI:10.1145/3578503
General Chairs:
Ágnes Horvát
Northwestern University, IL, USA
,
Wendy Hall
University of Southampton, UK
,
Noshir Contractor
Northwestern University, IL, USA
,
Organizing Chair:
Leon Fröhling
GESIS, Germany
,
Program Chairs:
Katherine Ognayova
Rutgers University, NJ, USA
,
Harsh Taneja
University of Illinois Urbana-Champaign, IL, USA
,
Ingmar Weber
Saarland University, Germany
,
Publications Chairs:
Kristina Gligori?
Stanford University, CA, USA
,
Yelena Mejova
ISI Foundation, Italy
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bias
dataset
language models
query suggestion
search queries
transformers
web search
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate218of875submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)70
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Qbias - A Dataset on Media Bias in Search Queries and Query Suggestions

WebSci '23: Proceedings of the 15th ACM Web Science Conference 2023

ABSTRACT

References

Cited By

Index Terms

Recommendations

Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation

Investigation of Bias in Web Search Queries

Mining Web search engines for query suggestion