The Other Side of Compression: Measuring Bias in Pruned Transformers

Proskurina, Irina; Metzler, Guillaume; Velcin, Julien

doi:10.1007/978-3-031-30047-9_29

Irina Proskurina¹⁰,
Guillaume Metzler¹⁰ &
Julien Velcin¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13876))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

794 Accesses

Abstract

Social media platforms have become popular worldwide. Online discussion forums attract users because of their easy access, speech freedom, and ease of communication. Yet there are also possible negative aspects of such communication, including hostile and hate language. While fast and effective solutions for detecting inappropriate language online are constantly being developed, there is little research focusing on the bias of compressed language models that are commonly used nowadays. In this work, we evaluate bias in compressed models trained on Gab and Twitter speech data and estimate to which extent these pruned models capture the relevant context when classifying the input text as hateful, offensive or neutral. Results of our experiments show that transformer-based encoders with 70% or fewer preserved weights are prone to gender, racial, and religious identity-based bias, even if the performance loss is insignificant. We suggest a supervised attention mechanism to counter bias amplification using ground truth per-token hate speech annotation. The proposed method allows pruning BERT, RoBERTa and their distilled versions up to 50% while preserving 90% of their initial performance according to bias and plausibility scores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The implementation of the experiments can be found at https://github.com/upunaprosk/fair-pruning.
2.
In our work, we use token-wise and word-level supervision interchangeably.
3.
That token is used for classification in Transformer LMs.

References

Bisht, A., Singh, A., Bhadauria, H., Virmani, J., et al.: Detection of hate speech and offensive language in twitter data using LSTM model. In: Jain, S., Paul, S. (eds.) Recent Trends in Image and Signal Processing in Computer Vision. AISC, vol. 1124, pp. 243–264. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2740-1_17
Borkan, D., Dixon, L., Sorensen, J.S., Thain, N., Vasserman, L.: Nuanced metrics for measuring unintended bias with real data for text classification. In: Companion Proceedings of The 2019 World Wide Web Conference (2019)
Google Scholar
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. (CSUR) 51(4), 1–30 (2018)
Article Google Scholar
Gupta, M., Varma, V., Damani, S., Narahari, K.N.: Compression of deep learning models for NLP. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3507–3508. CIKM 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3412171
Hooker, S., Courville, A., Clark, G., Dauphin, Y., Frome, A.: What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248 (2019)
Jawahar, G., Sagot, B., Seddah, D.: What does BERT learn about the structure of language? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3651–3657. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1356
Lima, L., Reis, J.C., Melo, P., Murai, F., Benevenuto, F.: Characterizing (un) moderated textual data in social systems. In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 430–434. IEEE (2020)
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
Maass, A., Cadinu, M.: Stereotype threat: when minority members underperform. Eur. Rev. Soc. Psychol. 14(1), 243–275 (2003). https://doi.org/10.1080/10463280340000072
Mathew, B., Saha, P., Yimam, S.M., Biemann, C., Goyal, P., Mukherjee, A.: Hatexplain: a benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14867–14875 (2021)
Google Scholar
Merchant, A., Rahimtoroghi, E., Pavlick, E., Tenney, I.: What happens to BERT embeddings during fine-tuning? arXiv preprint arXiv:2004.14448 (2020)
Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PLoS ONE 15(8), e0237861 (2020)
Google Scholar
Mutanga, R.T., Naicker, N., Olugbara, O.O.: Hate speech detection in twitter using transformer methods. Int. J. Adv. Comput. Sci. Appl. 11(9) (2020)
Google Scholar
Neill, J.O.: An overview of neural network compression. arXiv preprint arXiv:2006.03669 (2020)
Niu, J., Lu, W., Penn, G.: Does BERT rediscover a classical NLP pipeline? In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 3143–3153 (2022)
Google Scholar
Röttger, P., Vidgen, B., Nguyen, D., Waseem, Z., Margetts, H., Pierrehumbert, J.B.: Hatecheck: functional tests for hate speech detection models. arXiv preprint arXiv:2012.15606 (2020)
Sajjad, H., Dalvi, F., Durrani, N., Nakov, P.: Poor man’s BERT: smaller and faster transformer models. CoRR abs/2004.03844 (2020)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108 (2019)
Google Scholar
Soares, I.B., Wei, D., Ramamurthy, K.N., Singh, M., Yurochkin, M.: Your fairness may vary: pretrained language model fairness in toxic text classification. In: Annual Meeting of the Association for Computational Linguistics (2022)
Google Scholar
Steiger, M., Bharucha, T.J., Venkatagiri, S., Riedl, M.J., Lease, M.: The psychological well-being of content moderators. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, ACM, May 2021. https://doi.org/10.1145/3411764.3445092
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)
Google Scholar
Xu, C., Zhou, W., Ge, T., Wei, F., Zhou, M.: Bert-of-theseus: compressing BERT by progressive module replacing. arXiv preprint arXiv:2002.02925 (2020)
Xu, J.M., Jun, K.S., Zhu, X., Bellmore, A.: Learning from bullying traces in social media. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 656–666 (2012)
Google Scholar
Yin, W., Zubiaga, A.: Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Comput. Sci. 7, e598 (2021)
Google Scholar

Download references

Acknowledgements

This work was funded by the ANR project Dikè (grant number ANR-21-CE23-0026-02).

Author information

Authors and Affiliations

Universitè de Lyon, Lyon 2, ERIC UR3083, Bron, France
Irina Proskurina, Guillaume Metzler & Julien Velcin

Authors

Irina Proskurina
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Metzler
View author publications
You can also search for this author in PubMed Google Scholar
Julien Velcin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Proskurina .

Editor information

Editors and Affiliations

Université de Caen Normandie, Caen, France
Bruno Crémilleux
Eindhoven University of Technology, Eindhoven, The Netherlands
Sibylle Hess
UCLouvain, Louvain-la-Neuve, Belgium
Siegfried Nijssen

Appendix

Table 3. Performance and fairness scores (Subgroup AUC) of models trained with word-level supervision. The numbers in parentheses represent the ratio of the layers preserved when pruning bottom layers. \(\lambda =0\) stands for non-supervised attention learning.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Proskurina, I., Metzler, G., Velcin, J. (2023). The Other Side of Compression: Measuring Bias in Pruned Transformers. In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham. https://doi.org/10.1007/978-3-031-30047-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-30047-9_29
Published: 01 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30046-2
Online ISBN: 978-3-031-30047-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Other Side of Compression: Measuring Bias in Pruned Transformers

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation