ABSTRACT
Academics, activists, and regulators are increasingly urging companies to develop and deploy sociotechnical systems that are fair and unbiased. Achieving this goal, however, is complex: the developer must (1) deeply engage with social and legal facets of "fairness" in a given context, (2) develop software that concretizes these values, and (3) undergo an independent algorithm audit to ensure technical correctness and social accountability of their algorithms. To date, there are few examples of companies that have transparently undertaken all three steps.
In this paper we outline a framework for algorithmic auditing by way of a case-study of pymetrics, a startup that uses machine learning to recommend job candidates to their clients. We discuss how pymetrics approaches the question of fairness given the constraints of ethical, regulatory, and client demands, and how pymetrics' software implements adverse impact testing. We also present the results of an independent audit of pymetrics' candidate screening tool.
We conclude with recommendations on how to structure audits to be practical, independent, and constructive, so that companies have better incentive to participate in third party audits, and that watchdog groups can be better prepared to investigate companies.
- Ifeoma Ajunwa. 2020. The Paradox of Automation as Anti-Bias Intervention. Cardozo, L. Rev. 41 (2020).Google Scholar
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
- Joshua Asplund, Motahhare Eslami, Hari Sundaram, Christian Sandvig, and Karrie Karahalios. 2020. Auditing Race and Gender Discrimination in Online Housing Markets. In Proc of ICWSM.Google ScholarCross Ref
- Jack Bandy and Nicholas Diakopoulos. 2020. Auditing News Curation Systems: A Case Study Examining Algorithmic and Editorial Logic in Apple News. In Proc of ICWSM.Google ScholarCross Ref
- Solon Barocas and Andrew D. Selbst. 2016. Big Data's Disparate Impact. 104 California Law Review 671 (2016).Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proc. of FAT*.Google Scholar
- Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (April 2017), 183--186.Google ScholarCross Ref
- Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the Impact of Gender on Rank in Resume Search Engines. In Proc. of CHI.Google ScholarDigital Library
- Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. In Proc. of IMC.Google ScholarDigital Library
- Le Chen, Alan Mislove, and Christo Wilson. 2016. An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace. In Proc. of WWW.Google ScholarDigital Library
- Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Proc. of FAT*.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20 (1995).Google Scholar
- Nicholas Diakopoulos. 2014. Algorithmic Accountability Reporting: on the Investigation of Black Boxes. Tow Center for Digital Journalism Brief.Google Scholar
- Nicholas Diakopoulos, Daniel Trielli, Jennifer Stark, and Sean Mussenden. 2018. I Vote For---How Search Informs Our Choice of Candidate. In Digital Dominance: The Power of Google, Amazon, Facebook, and Apple, M. Moore and D. Tambini (Eds.). 22.Google Scholar
- Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2018. Runaway Feedback Loops in Predictive Policing. In Proc. of FAT*.Google Scholar
- Equal Employment Opportunity Commission, Civil Service Commission, et al. 1978. Uniform guidelines on employee selection procedures. Federal Register 43, 166 (1978), 38290--38315.Google Scholar
- Motahhare Eslami, Kristen Vaccaro, Karrie Karahalios, and Kevin Hamilton. 2017. "Be careful; things can be worse than they appear": Understanding Biased Algorithms and Users' Behavior around Them in Rating Platforms. In Proc of ICWSM.Google ScholarCross Ref
- Motahhare Eslami, Kristen Vaccaro, Min Kyung Lee, Amit Elazari Bar On, Eric Gilbert, and Karrie Karahalios. 2019. User Attitudes towards Algorithmic Opacity and Transparency in Online Reviewing Platforms. In Proc. of CHI.Google ScholarDigital Library
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proc. of KDD.Google ScholarDigital Library
- Jessica Fjeld, Nele Achten, Hannah Hilligoss, Adam Nagy, and Madhulika Srikumar. 2020. Principled Artificial Intelligence: Mapping Consensus in Ethical and Rights-Based Approaches to Principles for AI. Berkman Klein Center Research Publication 2020, 1 (2020). https://ssrn.com/abstract=3518482Google Scholar
- International Organization for Standardization. 2012. ISO/IEC 27001 Information Security Management. http://iso.org/isoiec-27001-information-security.html.Google Scholar
- Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im)possibility of fairness. CoRR abs/1609.07236 (2016).Google Scholar
- Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proc. of FAT*.Google ScholarDigital Library
- Batya Friedman and Helen Nissenbaum. 1996. Bias in Computer Systems. ACM Trans. Inf. Syst. 14, 3 (July 1996), 330--347.Google ScholarDigital Library
- Aniko Hannak, Piotr Sapieżyński, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. 2013. Measuring Personalization of Web Search. In Proc. of WWW.Google ScholarDigital Library
- Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove, and Christo Wilson. 2014. Measuring Price Discrimination and Steering on E-commerce Web Sites. In Proc. of IMC.Google ScholarDigital Library
- Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in Online Freelance Marketplaces: Evidence from TaskRabbit and Fiverr. In Proc of CSCW.Google ScholarDigital Library
- Desheng Hu, Shan Jiang, Ronald E. Robertson, and Christo Wilson. 2019. Auditing the Partisanship of Google Search Snippets. In Proc. of WWW.Google ScholarDigital Library
- Eslam Hussein, Prerna Juneja, and Tanushree Mitra. 2020. Measuring Misinformation in Video Search Platforms: An Audit Study on YouTube. Proc. ACM Hum.-Comput. Interact. 4, CSCW1 (May 2020).Google ScholarDigital Library
- Shan Jiang, Le Chen, Alan Mislove, and Christo Wilson. 2018. On Ridesharing Competition and Accessibility: Evidence from Uber, Lyft, and Taxi. In Proc. of WWW.Google ScholarDigital Library
- Anna Kawakami, Khonzoda Umarova, Dongchen Huang, and Eni Mustafaraj. 2020. The 'Fairness Doctrine' Lives on? Theorizing about the Algorithmic News Curation of Google's Top Stories. In Proc. of HT.Google Scholar
- Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proc. of CHI.Google Scholar
- Pauline T. Kim. 2017. Data-Driven Discrimination at Work. William & Mary Law Review 58 (2017).Google Scholar
- Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent Trade-Offs in the Fair Determination of Risk Scores. CoRR abs/1609.05807 (2016).Google Scholar
- Chloe Kliman-Silver, Aniko Hannak, David Lazer, Christo Wilson, and Alan Mislove. 2015. Location, Location, Location: The Impact of Geolocation on Web Search Personalization. In Proc. of IMC.Google ScholarDigital Library
- Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. 2017. Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media. In Proc of CSCW.Google ScholarDigital Library
- Peter Lee. 2016. Learning from Tay's Introduction. Official Microsoft Blog. https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/.Google Scholar
- Zachary Lipton, Julian McAuley, and Alexandra Chouldechova. 2018. Does mitigating ML's impact disparity require treatment disparity?. In Proc. of NeurIPS.Google Scholar
- Kristian Lum and William Isaac. 2016. To predict and serve? Significance 13, 5 (2016).Google Scholar
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proc. of NIPS.Google ScholarDigital Library
- Emma Lurie and Eni Mustafaraj. 2018. Investigating the Effects of Google's Search Engine Result Page in Evaluating the Credibility of Online News Sources. In Proc. of WebSci.Google ScholarDigital Library
- Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware Data Mining. In Proc. of KDD.Google ScholarDigital Library
- pymetrics, inc. 2019. [Confidential] Fairness Testing Procedures.Google Scholar
- pymetrics, inc. 2019. [Confidential] Games, Measures and Factors: Measurement Validity.Google Scholar
- pymetrics, inc. 2019. [Confidential] Job Analysis Methods & Process.Google Scholar
- pymetrics, inc. 2019. [Confidential] Technical Brief for pymetrics, inc.Google Scholar
- pymetrics, inc. 2020. [Confidential] Demographic Disclosure Study.Google Scholar
- pymetrics, inc. 2020. pymetrics/audit-ai. GitHub. https://github.com/pymetrics/audit-ai.Google Scholar
- Lincoln Quillian, Devah Pager, Ole Hexel, and Arnfinn H. Midtbøen. 2017. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time. Proceedings of the National Academy of Sciences 114, 41 (2017), 10870--10875.Google ScholarCross Ref
- Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices. In Proc. of FAT*.Google ScholarDigital Library
- Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. In Proc. of FAT*.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proc. of KDD.Google ScholarDigital Library
- Ronald E Robertson, Shan Jiang, Kenneth Joseph, Lisa Friedland, David Lazer, and Christo Wilson. 2018. Auditing Partisan Audience Bias within Google Search. Proceedings of the ACM: Human-Computer Interaction 2, CSCW (November 2018).Google ScholarDigital Library
- Ronald E. Robertson, Shan Jiang, David Lazer, and Christo Wilson. 2019. Auditing Autocomplete: Recursive Algorithm Interrogation and Suggestion Networks. In Proc. of WebSci.Google ScholarDigital Library
- Ronald E Robertson, David Lazer, and Christo Wilson. 2018. Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages. In Proc. of WWW.Google ScholarDigital Library
- Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. In Proc. of Data and Discrimination: Converting Critical Concerns into Productive Inquiry, a preconference at the Annual Meeting of the International Communication Association.Google Scholar
- Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2016. MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. In Proc. of WWW.Google ScholarDigital Library
- Latanya Sweeney. 2013. Discrimination in Online Ad Delivery. ACM Queue 11, 3 (April 2013).Google ScholarDigital Library
- U.S. Congress. 1964. Civil Rights Act.Google Scholar
- Giridhari Venkatadri, Yabing Liu, Athanasios Andreou, Oana Goga, Patrick Loiseau, Alan Mislove, and Krishna P. Gummadi. 2018. Privacy Risks with Facebook's PII-based Targeting: Auditing a Data Broker's Advertising Interface. In Proc. of IEEE Symposium on Security and Privacy.Google Scholar
- Giridhari Venkatadri, Elena Lucherini, Piotr Sapiezyński, and Alan Mislove. 2019. Investigating sources of PII used in Facebook's targeted advertising. In Proc. of PETS.Google ScholarCross Ref
- James Vincent. 2018. These stickers make computer vision software hallucinate things that aren't there. The Verge. https://www.theverge.com/2018/1/3/16844842/ai-computer-vision-trick-adversarial-patches-google.Google Scholar
- Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P. Gummadi. 2017. Fairness Constraints: Mechanisms for Fair Classification. In Proc. of International Conference on Artificial Intelligence and Statistics.Google Scholar
Index Terms
- Building and Auditing Fair Algorithms: A Case Study in Candidate Screening
Recommendations
Performance evaluation of a fair backoff algorithm for IEEE 802.11 DFWMAC
MobiHoc '02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computingDue to hidden terminals and a dynamic topology, contention among stations in an ad-hoc network is not homogeneous. Some stations are at a disadvantage in opportunity of access to the shared channel and can suffer severe throughput degradation when the ...
Inter-AP coordination for fair throughput in infrastructure-based IEEE 802.11 mesh networks
IWCMC '06: Proceedings of the 2006 international conference on Wireless communications and mobile computingThis paper studies throughput fairness among different basic service sets (BSSs) in infrastructure-based IEEE 802.11 mesh networks, where inter-BSS interference is unavoidable because of the difficulty in frequency and coverage planning and the limited ...
Enhanced binary exponential backoff algorithm for fair channel access in the ieee 802.11 medium access control protocol
The medium access control protocol determines system throughput in wireless mobile ad hoc networks following the ieee 802.11 standard. Under this standard, asynchronous data transmissions have a defined distributed coordination function that allows ...
Comments