Addressing Biases in the Texts Using an End-to-End Pipeline Approach

Raza, Shaina; Bashir, Syed Raza; Sneha; Qamar, Urooj

doi:10.1007/978-3-031-37249-0_8

Shaina Raza⁹,
Syed Raza Bashir¹⁰,
Sneha¹⁰ &
…
Urooj Qamar¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1840))

Included in the following conference series:

International Workshop on Algorithmic Bias in Search and Recommendation

211 Accesses

Abstract

The concept of fairness is gaining popularity in academia and industry. Social media is especially vulnerable to media biases and toxic language and comments. We propose a fair ML pipeline that takes a text as input and determines whether it contains biases and toxic content. Then, based on pre-trained word embeddings, it suggests a set of new words by substituting the biased words, the idea is to lessen the effects of those biases by replacing them with alternative words. We compare our approach to existing fairness models to determine its effectiveness. The results show that our proposed pipeline can detect, identify, and mitigate biases in social media data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

AI, J.: Jigsaw Multilingual Toxic Comment Classification (2022). https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification/overview. Accessed 03 Feb 2022
Borkan, D., Dixon, L., Jeffrey Sorensen, J., Thain, N., Lucy Vasserman, J., Sorensen, J.: Nuanced metrics for measuring unintended bias with real data for text classification (2019)https://doi.org/10.1145/3308560.3317593
Dixon, J.L., et al.: Measuring and mitigating unintended bias in text classification. Ethics, Soc. 67–73 (2018). https://doi.org/10.1145/3278721.3278729
Nielsen, A.: Practical Fairness. O’Reilly Media (2020)
Google Scholar
Google developers: machine learning glossary|Google developers (2018). https://developers.google.com/machine-learning/glossary#m%0A https://developers.google.com/machine-learning/glossary/#t
Wang, Y., Ma, W., Zhang, M., Liu, Y., Ma, S.: A survey on the fairness of recommender systems. ACM Trans. Inf. Syst. (2022)https://doi.org/10.1145/3547333
Mastrine, J.: Types of Media Bias and How to Spot It AllSides (2018). https://www.allsides.com/media-bias/how-to-spot-types-of-media-bias
Matfield, K.: Gender decoder: find subtle bias in job ads (2016). http://gender-decoder.katmatfield.com
Service-Growth: Examples of gender-sensitive language (2003)
Google Scholar
Barbour, H.: 25 Examples of Biased Language|Ongig Blog (2022). https://blog.ongig.com/diversity-and-inclusion/biased-language-examples. Accessed 03 Feb 2022
Narayanan, A.: Fairness Definitions and Their Politics. In: Tutorial Presented at the Conference on Fairness, Accountability, and Transparency (2018)
Google Scholar
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination (2012)https://doi.org/10.1007/s10115-011-0463-8
Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340 (2018)
Google Scholar
Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., Weinberger, K.Q.: On fairness and calibration. arXiv Prepr. arXiv:1709.02012 (2017)
Sattigeri, P., Hoffman, S.C., Chenthamarakshan, V., Varshney, K.R.: Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM J. Res. Dev. 63, 1–3 (2019)
Article Google Scholar
Saleiro, P., et al.: Aequitas: A bias and fairness audit toolkit. arXiv Prepr. arXiv:1811.05577 (2018)
Bantilan, N.: Themis-ml: a fairness-aware machine learning interface for end-to-end discrimination discovery and mitigation. J. Technol. Hum. Serv. 36, 15–30 (2018)
Article Google Scholar
Bird, S., et al.: A toolkit for assessing and improving fairness in AI
Google Scholar
Using the what-if tool AI platform prediction Google Cloud
Google Scholar
Bellamy, R.K.E., et al.: AI fairness 360: an extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63 (2019). https://doi.org/10.1147/JRD.2019.2942287
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances Neural Information Processing System 30 (2017)
Google Scholar
Locatello, F., Abbati, G., Rainforth, T., Bauer, S., Schölkopf, B., Bachem, O.: On the fairness of disentangled representations. In: Advances Neural Information Processing System 32 (2019)
Google Scholar
Clark, K., Luong, M.-T., Le, Q.V, Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. arXiv Prepr. arXiv:2003.10555 (2020)
Gaucher, D., Friesen, J., Kay, A.C.: Evidence that gendered wording in job advertisements exists and sustains gender inequality. J. Pers. Soc. Psychol. 101, 109–128 (2011). https://doi.org/10.1037/a0022530
Article Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv Prepr. arXiv:1402.3722 (2014)
Kaneko, M., Bollegala, D.: Dictionary-based debiasing of pre-trained word embeddings, pp. 212–223 (2021)
Google Scholar
MacCarthy, M.: Standards of fairness for disparate impact assessment of big data algorithms. Cumb. L. Rev. 48, 67 (2017)
Google Scholar
AllenNLP: AllenNLP - ELMo—Allen Institute for AI (2022). https://allenai.org/allennlp/software/elmo
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv Prepr. arXiv:1907.11692 (2019)
Celis, L.E., Huang, L., Keswani, V., Vishnoi, N.K.: Classification with fairness constraints: a meta-algorithm with provable guarantees. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 319–328 (2019)
Google Scholar
Welcome to the Adversarial Robustness Toolbox—Adversarial Robustness Toolbox 1.10.3 documentation. https://adversarial-robustness-toolbox.readthedocs.io/en/latest/
Raza, S., Ding, C.: A regularized model to trade-off between accuracy and diversity in a news recommender system. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 551–560 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Toronto, Toronto, ON, Canada
Shaina Raza
Toronto Metropolitan University, Toronto, ON, Canada
Syed Raza Bashir & Sneha
Institute of Business and Information Technology, University of the Punjab, Lahore, Pakistan
Urooj Qamar

Authors

Shaina Raza
View author publications
You can also search for this author in PubMed Google Scholar
Syed Raza Bashir
View author publications
You can also search for this author in PubMed Google Scholar
Sneha
View author publications
You can also search for this author in PubMed Google Scholar
Urooj Qamar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaina Raza .

Editor information

Editors and Affiliations

University of Cagliari, Cagliari, Italy
Ludovico Boratto
Sapienza University of Rome, Rome, Italy
Stefano Faralli
University of Cagliari, Cagliari, Italy
Mirko Marras
University of L' Aquila, L’Aquila, L'Aquila, Italy
Giovanni Stilo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raza, S., Bashir, S.R., Sneha, Qamar, U. (2023). Addressing Biases in the Texts Using an End-to-End Pipeline Approach. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds) Advances in Bias and Fairness in Information Retrieval. BIAS 2023. Communications in Computer and Information Science, vol 1840. Springer, Cham. https://doi.org/10.1007/978-3-031-37249-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-37249-0_8
Published: 15 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37248-3
Online ISBN: 978-3-031-37249-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics