abstract

The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness

Author:
Mariya I. Vasileva

University of Illinois at Urbana-Champaign, IL, USA

University of Illinois at Urbana-Champaign, IL, USA
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 3586–3587https://doi.org/10.1145/3394486.3411068

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3586–3587

ABSTRACT

Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness.

The ability to collect and analyze large datasets for problems in many domains brings forward the danger of implicit data bias, which could be harmful. Data, especially big data, is often heterogeneous, generated by different subgroups with their owncharacteristics and behaviors. Furthermore, data collection strategies vary vastly across domains, and labelling of examples is performed by human annotators, thus causing the labelling process to amplify inherent biases the annotators might harbor. A model learned on biased data may not only lead to unfair and inaccurate predictions, but also significantly disadvantage certain subgroups, and lead to unfairness in downstream learning tasks. There aremultiple ways in which discriminatory bias can seep into data: for example, in medical domains, there are many instances in whichthe data used are skewed toward certain populations-which canhave dangerous consequences for the underrepresented communities [1]. Another example are large-scale datasets widely used in machine learning tasks, like ImageNet and Open Images: [2] shows that these datasets suffer from representation bias, and advocates for the need to incorporate geo-diversity and inclusion. Yet another example are the popular face recognition and generation datasets like CelebA and Flickr-Faces-HQ, where the ethnic and racial breakdown of example faces shows significant representation bias, evident in downstream tasks like face reconstruction from an obfuscated image [8].

In order to be able to fight discriminatory use of machine learning algorithms that leverage such biases, one needs to first define the notion of algorithmic fairness. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision making [3]. Fairness definitions fall under three broad types: individual fairness (whereby similar predictions are given to similar individuals [4, 5]), group fairness (whereby different groups are treated equally [4, 5]), and subgroup fairness (whereby a group fairness constraint is being selected, and the task is to determine whether the constraint holds over a large collection of subgroups [6, 7]). In this talk, I will discuss a formal definition of these fairness constraints, examine the ways in which machine learning algorithms can amplify representation bias, and discuss how bias in both the example set and label set of popular datasets has been misused in a discriminatory manner. I will touch upon the issues of ethics and accountability, and present open research directions for tackling algorithmic fairness at the representation level.

References

Arjun K. Manrai, Birgit H. Funke, Heidi L. Rehm, Morten S. Olesen, Bradley A. Maron, Peter Szolovits, David M. Margulies, Joseph Loscalzo, and Isaac S. Kohane. 2016. Genetic Misdiagnoses and the Potential for Health Disparities. New England Journal of Medicine 375, 7 (2016), 655--665. PMID: 27532831.Google ScholarCross Ref
Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. stat 1050 (2017), 22.Google Scholar
Nripsuta Ani Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David C Parkes, and Yang Liu. 2019. How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 99--106.Google ScholarDigital Library
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226.Google ScholarDigital Library
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4066--4076.Google ScholarDigital Library
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In International Conference on Machine Learning. 2569--2577.Google Scholar
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 100--109.Google ScholarDigital Library
Sachit Menon, Alex Damian, McCourt Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref

Index Terms

The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness
1. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Relay-volunteered multi-rate cooperative MAC protocol for IEEE 802.11 WLANs

In IEEE 802.11, the rate of a station (STA) is dynamically determined by link adaptation. Low-rate STAs tend to hog more channel time than high-rate STAs due to fair characteristics of carrier sense multiple access/collision avoidance, leading to ...
Read More
Measuring Fairness in Ranked Outputs
SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

Ranking and scoring are ubiquitous. We consider the setting in which an institution, called a ranker, evaluates a set of individuals based on demographic, behavioral or other characteristics. The final output is a ranking that represents the relative ...
Read More
Fairness in Machine Learning: A Survey
When Machine Learning technologies are used in contexts that affect citizens, companies as well as researchers need to be confident that there will not be any unexpected social implications, such as bias towards gender, ethnicity, and/or people with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Check for updates
Author Tags
accountability
bias in machine learning algorithms
fairness
representation learning
transparency
Qualifiers
- abstract
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 596
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relay-volunteered multi-rate cooperative MAC protocol for IEEE 802.11 WLANs

Measuring Fairness in Ranked Outputs

Fairness in Machine Learning: A Survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relay-volunteered multi-rate cooperative MAC protocol for IEEE 802.11 WLANs

Measuring Fairness in Ranked Outputs

Fairness in Machine Learning: A Survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media