Skip to main content
Log in

Integrating induction and deduction for finding evidence of discrimination

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the process of direct and indirect discrimination discovery in a rule-based framework, by modelling protected-by-law groups, such as minorities or disadvantaged segments, and contexts where discrimination occurs. Classification rules, extracted from the historical records, allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is evaluated by formalizing existing norms and regulations in terms of quantitative measures. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Key legal concepts and reasonings are then used to drive the analysis on the set of classification rules, with the aim of discovering patterns of discrimination, either direct or indirect. Analyses of affirmative action, favoritism and argumentation against discrimination allegations are also modelled in the proposed framework. Finally, we present an implementation, called LP2DD, of the overall reference model that integrates induction, through data mining classification rule extraction, and deduction, through a computational logic implementation of the analytical tools. The LP2DD system is put at work on the analysis of a dataset of credit decision records.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. \({\frac{con\,f( {\bf A, B \rightarrow C} ) } { con\,f( {\bf B \rightarrow C} ) }} = {\frac{supp( {\bf A, B, C} ) supp( {\bf B} )} { supp( {\bf A, B} ) supp( {\bf B, C} ) }} = {\frac{con\,f( {\bf B, C \rightarrow A} ) } { con\,f( {\bf B \rightarrow A} ) }}.\)

  2. We use the name “a-protection” instead of “α-protection” in order not to generate confusion later on when confidence intervals at the significance level of 100(1 − α)% will be introduced.

  3. For a rule X → A, there are 2|X| rules A, B → D obtained by splitting X into D and B.

  4. With reference to Fig. 2, consider a rule c with a 1 = x, n 1 = x + 1, a 2 = 1, n 2 = y, for x, y natural numbers. Fixed \(x = ms |{\mathcal D}|\) to satisfy the minimum support requirement, we have slift(c) = (x y)/(x + 1) ≥ y/2, which is unbound. The reasoning is analogous for the odds lift, which is olift(c) = x(y−1).

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994, Morgan Kaufmann, pp 487–499

  • Agresti A (2002) Categorical data analysis. Wiley, London

    Book  MATH  Google Scholar 

  • Agresti A, Brian C (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54(4):280–288

    Article  Google Scholar 

  • Apt KR (1997) From logic programming to prolog. Prentice Hall, Englewood

    Google Scholar 

  • Australian Legislation (2010)(a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State. http://www.austlii.edu.au

  • Baesens B, Gestel TV, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635

    Article  MATH  Google Scholar 

  • Becker GS (1957) The economics of discrimination. University of Chicago Press, Chicago

    Google Scholar 

  • Bell M, Chopin I, Palmer F (2007) Developing anti-discrimination law in Europe. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights

  • Calem PS, Gillen K, Wachter S (2004) The neighborhood distribution of subprime mortgage lending. J Real Estate Finance Econ 29:393–410

    Article  Google Scholar 

  • Chien CF, Chen L (2008) Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Expert Syst Appl 34(1):280–290

    Article  Google Scholar 

  • Dymski GA (2006) Discrimination in the credit and housing markets: findings and challenges. In: Rodgers WM (ed) Handbook on the economics of discrimination. Edward Elgar Publishing Inc., Northampton, MA, pp 215–259

    Google Scholar 

  • Ellis E (2005) EU Anti-Discrimination Law. Oxford University Press, Oxford

    Google Scholar 

  • ENAR (2007) European Network Against Racism, Fact Sheet 33: multiple discrimination. http://www.enar-eu.org

  • ENAR (2008) European Network Against Racism, Fact Sheet 35: positive actions. http://www.enar-eu.org

  • European Union Legislation (2010) (a) Racial Equality Directive, (b) Employment Equality Directive. http://www.ec.europa.eu/employment_social/fundamental_rights

  • Farrington CP, Manning G (1990) Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Stat Med 9:1447–1454

    Article  Google Scholar 

  • Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley, London

    Book  MATH  Google Scholar 

  • Gastwirth JL (1984) Statistical methods for analyzing claims of employment discrimination. Ind Labor Relat Rev 38:75–86

    Article  Google Scholar 

  • Gastwirth JL (1992) Statistical reasoning in the legal setting. Am Stat 46(1):55–69

    Article  Google Scholar 

  • Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3), Article 9

  • Goethals B (2010) Frequent itemset mining implementations repository. http://www.fimi.cs.helsinki.fi

  • Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86

    Article  MathSciNet  Google Scholar 

  • Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J Roy Stat Soc Ser A 160:523–541

    Article  Google Scholar 

  • Harford T (2008) The logic of life. The Random House Publishing Group, New York, NY

    Google Scholar 

  • Hintoglu AA, Inan A, Saygin Y, Keskinöz M (2005) Suppressing data sets to prevent discovery of association rules. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 645–648

  • Holzer HJ, Neumark D (eds) (2004) The economics of affirmative action. Edward Elgar, Cheltenham

    Google Scholar 

  • Holzer HJ, Neumark D (2006) Affirmative action: what do we know? J Policy Anal Manag 25:463–490

    Article  Google Scholar 

  • Hunter R (1992) Indirect discrimination in the workplace. The Federation Press, Annandale

    Google Scholar 

  • Johnston B, Governatori G (2003) Induction of defeasible logic theories in the legal domain. In: Proceedings of ICAIL 2003, ACM, pp 204–213

  • Kamiran F, Calders T (2009) Classification without discrimination. In: IEEE international conference on computer, control & communication (IEEE-IC4), IEEE press

  • Kaye D, Aickin M (eds) (1992) Statistical methods in discrimination litigation. Marcel Dekker, Inc., New York

    Google Scholar 

  • Kim KH (2007) Favoritism and reverse discrimination. Eur Econ Rev 51:101–123

    Article  Google Scholar 

  • Knopff R (1986) On proving discrimination: statistical methods and unfolding policy logics. Can Public Policy 12:573–583

    Article  Google Scholar 

  • Kuhn P (1987) Sex discrimination in labor markets: the role of statistical evidence. Am Econ Rev 77:567–583

    Google Scholar 

  • LaCour-Little M (1999) Discrimination in mortgage lending: a critical review of the literature. J Real Estate Lit 7:15–49

    Article  Google Scholar 

  • Lerner N (1991) Group rights and discrimination in international law. Martinus Nijhoff Publishers, Dordrecht

    Google Scholar 

  • Lerner R, Nagai AK (2000) Reverse discrimination by the numbers. J Acad Quest 13:71–84

    Article  Google Scholar 

  • Leung HM, Kupper LL (1981) Comparisons of confidence intervals for attributable risk. Biometrics 37(2):293–302

    Article  MATH  Google Scholar 

  • Makkonen T (2006) Measuring discrimination: data collection and the EU equality law. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights

  • Makkonen T (2007) European handbook on equality data. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights

  • Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17:873–890

    Article  Google Scholar 

  • Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.archive.ics.uci.edu/ml

  • Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM KDD 2008, ACM, pp 560–568, Extended version to appear in ACM Trans. on Knowledge Discovery from Data

  • Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM SDM 2009, SIAM, pp 581–592

  • Piette MJ, White PF (1999) Approaches for dealing with small sample sizes in employment discrimination litigation. J Forensic Econ 12:43–56

    Google Scholar 

  • Prakken H, Sartor G (2002) The role of logic in computational models of legal argument: a critical survey. In: Kakas AC, Sadri F (eds) Computational logic. Logic programming and beyond, Springer, Lecture notes in Computer Science, vol 2408, pp 342–381

  • R Development Core Team (2010) R: a language and environment for statistical computing. Version 2.7.2, http://www.R-project.org

  • Rauch J, Simunek M (2005) An alternative approach to mining association rules. In: Lin TY, Ohsuga S, Liau C-J, Hu X, Tsumoto S (eds) Foundations of data mining and knowledge discovery, studies in computational intelligence, vol 6. Springer, USA, pp 211–231

  • Rauch J, Simunek M (2010) 4-ft Miner procedure. http://www.lispminer.vse.cz

  • Reiczigel J, Abonyi-Tóth Z, Singer J (2008) An exact confidence set for two binomial proportions and exact unconditional confidence intervals for the difference and ratio of proportions. Comput Stat Data Anal 52(11):5046–5053

    Article  MATH  Google Scholar 

  • Riach PA, Rich J (2002) Field experiments of discrimination in the market place. Econ J 112:480–518

    Article  Google Scholar 

  • Rorive I (2009) Proving discrimination cases—the role of situation testing. Centre For Equal Rights & Migration Policy Group http://www.migpolgroup.com/publications.php

  • Schiek D, Waddington L, Bell M (2007) Cases, materials and text on National, Supranational and International Non-Discrimination Law. IUS Commune Casebooks for the Common Law of Europe

  • Sowell T (ed) (2005) Affirmative action around the World: an empirical analysis. Yale University Press, New Haven

    Google Scholar 

  • Squires GD (2003) Racial profiling, insurance style: insurance redlining and the uneven development of metropolitan areas. J Urban Aff 25(4):391–410

    Article  MathSciNet  Google Scholar 

  • Sterling L, Shapiro E (1994) The art of prolog, 2nd edn. The MIT Press, Cambridge

    Google Scholar 

  • Stranieri A, Zeleznikow J (1999) The evaluation of legal knowledge based systems. In: Proceedings of ICAIL 1999, ACM, pp 18–24

  • Stranieri A, Zeleznikow J, Gawler M, Lewis B (1999) A hybrid rule—neural approach for the automation of legal reasoning in the discretionary domain of family law in australia. Artif Intell Law 7(2–3):153–183

    Article  Google Scholar 

  • Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588

    Article  MATH  MathSciNet  Google Scholar 

  • Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, Reading

    Google Scholar 

  • Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16:149–172

    Article  MATH  Google Scholar 

  • Tian M, Tang ML, Ng HKT, Chan PS (2008) Confidence intervals for the risk ratio under inverse sampling. Stat Med 27:3301–3324

    Article  MathSciNet  Google Scholar 

  • Tobler C (2008) Limits and potential of the concept of indirect discrimination. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights

  • UK Legislation (2010) (a) Sex Discrimination Act, (b) Race Relation Act. http://www.statutelaw.gov.uk

  • United Nations Legislation (2010) (a) Convention on the Elimination of All forms of Racial Discrimination, (b) Convention on the Elimination of All forms of Discrimination Against Women. http://www.ohchr.org

  • US Federal Legislation (2010) (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act, (f) Civil Right Act. http://www.usdoj.gov

  • Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447

    Article  Google Scholar 

  • Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 466–473

  • Webb GI (2000) Efficient search for association rules. In: Proceedings of ACM KDD 2000, ACM, pp 99–107

  • Wielemaker J (2009) SWI-Prolog. University of Amsterdam, Version 5.6, http://www.swi-prolog.org

  • Williams T, Kelley C (2010) Gnuplot. Version 4.0, http://www.gnuplot.info

  • Yin X, Han J (2003) CPAR: Classification based on Predictive Association Rules. In: Proceedings of SIAM SDM 2003, SIAM, pp 331–335

  • Yinger J (1998) Evidence on discrimination in consumer markets. J Econ Perspect 12:23–40

    Google Scholar 

  • Zeleznikow J, Vossos G, Hunter D (1994) The IKBALS project: multi-modal reasoning in legal knowledge based system. Artif Intell Law 2(3):169–203

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvatore Ruggieri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruggieri, S., Pedreschi, D. & Turini, F. Integrating induction and deduction for finding evidence of discrimination. Artif Intell Law 18, 1–43 (2010). https://doi.org/10.1007/s10506-010-9089-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-010-9089-5

Keywords

Navigation