Using sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception?

Organisations can use artificial intelligence to make decisions about people for a variety of reasons, for instance, to select the best candidates from many job applications. However, AI systems can have discriminatory effects when used for decision-making. To illustrate, an AI system could reject applications of people with a certain ethnicity, while the organisation did not plan such ethnicity discrimination. But in Europe, an organisation runs into a problem when it wants to assess whether its AI system accidentally discriminates based on ethnicity: the organisation may not know the applicants' ethnicity. In principle, the GDPR bans the use of certain 'special categories of data' (sometimes called 'sensitive data'), which include data on ethnicity, religion, and sexual preference. The proposal for an AI Act of the European Commission includes a provision that would enable organisations to use special categories of data for auditing their AI systems. This paper asks whether the GDPR's rules on special categories of personal data hinder the prevention of AI-driven discrimination. We argue that the GDPR does prohibit such use of special category data in many circumstances. We also map out the arguments for and against creating an exception to the GDPR's ban on using special categories of personal data, to enable preventing discrimination by AI systems. The paper discusses European law, but the paper can be relevant outside Europe too, as many policymakers in the world grapple with the tension between privacy and non-discrimination policy.

on the agenda of policymakers worldwide. 5 In many countries, privacy law and non-discrimination policy can be in conflict. We aim to describe European law in such a way that non-specialists can follow the discussion too.
The paper is structured as follows. In Sections 2 and 3 we introduce some key terms, summarise how AI can discriminate based on ethnicity and similar characteristics, and we introduce non-discrimination law. In sections 4 and 5 we analyse the GDPR's framework for special categories of data and we discuss whether the framework hinders organisations in collecting special categories of data. In section 6 we map out the arguments in favour of and against introducing a new exception to the collection and use of special category data for auditing AI systems. Section 7 discusses possible safeguards that could accompany a new exception, and section 8 concludes.

AI systems and discrimination
AI systems could be described, in the words of the Oxford Dictionary, as 'computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, 5 See e.g. European Commission, COM(2020) 65 White Paper On Artificial Intelligence -A European Approach to Excellence and Trust (EU 2020) 1, 9-12, 18-22 <https://ec.europa.eu/info/sites/default/files/commission-white-paper-artificialintelligence-feb2020_en.pdf>. See also the EU Digital Services Act with requirements for certain types of online platforms to assess discrimination risks. Articles 34(1)(b) and 40(4) of the 'Regulation of the European Parliament and the Council on a Single Market For Digital Services and Amending Directive 2000/31/EC PE-CONS 30/22 (Digital Services Act)' <https://data.consilium.europa.eu/doc/document/PE-30-2022-INIT/en/pdf>. In the US: e.g. the White house is planning an 'AI Bill of Rights'. White house employees announced a national effort to develop a "Bill of Rights for an AI-Powered World" in WIRED. See White House Office of Science and Technology Policy (OSTP), 'ICYMI: WIRED (Opinion): Americans Need a Bill of Rights for an AI-Powered World' (22 October 2021) <https://www.whitehouse.gov/ostp/news-updates/2021/10/22/icymi-wired-opinionamericans-need-a-bill-of-rights-for-an-ai-powered-world/> accessed 26 April 2022. A "blueprint" for the AI bill of rights was released: The White House, 'Blueprint for an AI  <https://www.coe.int/en/web/artificial-intelligence/-/news-of-the-european-commissionagainst-racism-and-intolerance-ecri->.A possible classification of biases was created by TILT. Tilburg Institute for Law, Technology, and Society, Handbook on Non-Discriminating Algorithms. Summary Research Report (2021) 5 <https://www.tilburguniversity.edu/about/schools/law/departments/tilt/research/hand book>. 9 A recruitment system used by Amazon until October 2018 seemed to display this type of bias. See Reuters, 'Amazon Scraps Secret AI Recruiting Tool That Showed Bias against Women' (11 October 2018) <https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G> accessed 7 April 2022.
AI systems can make discriminatory decisions about job applicants, harming certain ethnicities for instance, even if the system does not have direct access to data about people's ethnicity. Imagine an AI system that considers the postal codes where job applicants live. The postal codes could correlate with someone's ethnicity. Hence, the system might reject all people with a certain ethnicity, even if the organisation has ensured that the system does not consider people's ethnicity. In practice, an AI system might also consider hundreds of variables, in complicated combinations, that turn out to correlate with ethnicity. Variables that correlate with protected attributes such as ethnicity can be called 'proxy attributes'. Such correlations can lead to discrimination by proxy. 10 Because of proxy attributes, AI systems can have discriminatory effects by accident: AI developers or organisations using AI systems may not realise that the AI system discriminates. More causes of discrimination by AI systems exist that we will not delve into further in this paper, ranging from design decisions to the context in which the system is used. 11

Using special categories of data is useful to debias AI systems
Suppose that an organisation wants to test whether its AI system unfairly discriminates against job applicants with a certain ethnicity. To test this, the organisation must know the ethnicity of both the people who applied for the job, and of the people the organisation actually hired. Say that half of the people who sent in a job application letter has an immigrant background. The AI system selects the fifty best letters, out of the thousands of letters. The AI system decides based on attributes such as the school of choice, or courses followed. Of the fifty letters selected by the AI system, none is by somebody with an immigrant background. Such numbers suggest that the AI system should be investigated for unfair or illegal bias. Because proxy attributes may 10 Barocas and Selbst (n 8) 675  Here, we focus on European Union law. EU law forbids two forms of discrimination: direct and indirect discrimination. The Racial Equality Directive (about ethnicity) states that 'the principle of equal treatment shall mean that there shall be no direct or indirect discrimination based on racial or ethnic origin.' 18 The Racial Equality Directive describes direct discrimination as follows: 'Direct discrimination shall be taken to occur where one person is treated less favourably than another is, has been or would be treated in a comparable situation on grounds of racial or ethnic origin.' 19 A clear example 12 See more in-depth Žliobaitė and Custers (n 10) 190-193. 13 There are practical hurdles to testing an AI-system for discrimination. See the end of Section 6.3 of this paper. 14  In non-discrimination scholarship, the grounds such as ethnicity ('racial or ethnic origin') are called 'protected characteristics' or 'protected attributes'. An AI system that treats individuals differently based on their protected attributes would discriminate directly. A hypothetical example of direct discrimination by a computer system is if the programmer explicitly makes the system reject all women.
Our paper, however, focuses on indirect discrimination. The Racial Equality Directive defines indirect discrimination as follows.
[I]ndirect discrimination shall be taken to occur where (i) an apparently neutral provision, criterion or practice would (ii) put persons of a racial or ethnic origin at a particular disadvantage compared with other persons, (iii) unless that provision, criterion or practice is objectively justified by a legitimate aim and the means of achieving that aim are appropriate and necessary. 23 In short, indirect discrimination by an AI system can occur if the system ('practice') is neutral at first glance but ends up discriminating against people with a protected characteristic. For example, even if the protected attributes in an AI system were filtered out, the system can still discriminate based on metrics that are a proxy for the protected attribute. Say that a recruitment system bases its decision on the set of courses, age, and origin university from the job applicant's CV. Such metrics could correlate with ethnicity or another protected attribute.
For both direct and indirect discrimination, it is irrelevant whether the organisation discriminates by accident or on purpose. Hence, an organisation is always liable, even if the organisation did not realise that its AI system was indirectly discriminating. 24 Unlike for direct discrimination, for indirect discrimination there is an openended exception. Indirect discrimination is allowed if there is an objective justification. The possibility for justification is part of the definition of indirect discrimination: 'unless that provision, criterion or practice is objectively justified'. 25 In short, if the organisation (the alleged discriminator) has a legitimate aim for its neutral practice, and that practice is a proportional way of aiming for that practice, there is no illegal indirect discrimination. 26 If an AI system has discriminatory effects, the general norms from EU nondiscrimination law apply. 24 For completeness' sake, we add that the difference between direct and indirect discrimination can be somewhat fuzzy. Since the 1970s, a new field of law has been developed: data protection law. In the EU, the right to the protection of personal data has the status of a fundamental right. 30 The right to protection of personal data is explicitly protected in the Charter of Fundamental Rights of the European Union. 31 Data protection law grants rights to people whose data are being processed (data subjects), and imposes obligations on parties that process personal data (data controllers). Data protection law aims to protect personal data, and in doing so protects other values and rights. Unlike some seem to assume, data protection law does not only aim to protect privacy, but also aims to protect the right to non-discrimination and other rights.
5 Does the GDPR hinder the prevention of discrimination?

The GDPR's ban of processing special categories of data
With specific rules, the EU General Data Protection Regulation (GDPR) further works out the right to data protection from the EU Charter. The GDPR, like its predecessor, the Data Protection Directive from 1995, contains an in-principle  The GDPR also refers to the risk of discrimination in its preamble. Recital 71 concerns AI and calls upon organisations 35 to 'prevent, inter alia, discriminatory effects on natural persons on the basis of racial or ethnic origin, political opinion, (…) or that result in measures having such an effect.' 36 Article 9(1) GDPR is phrased as follows: Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a 32 Article 8 of the Data Protection Directive. 33 Committee of Ministers, Resolution (73)22 on the protection of the privacy of individuals visà-vis electronic data banks in the private sector, 26 September 1973, article 1. https://rm.coe.int/1680502830 34 Principle 5. https://www.refworld.org/docid/3ddcafaac.html 35 The GDPR puts most responsibilities on the 'data controller', in short, the body which determines the purposes and means of the personal data processing (Article 4(7) GDPR). For ease of reading, we speak of the 'organisation' in this paper. 36 A hypothetical example might be as follows. An organisation uses AI to select the best candidates from a number of job application letters. Recital 71 reminds the organisation that it should prevent that its system discriminates unfairly, for instance on the basis of ethnicity. natural person's sex life or sexual orientation shall be prohibited.
Most protected grounds in EU non-discrimination directives are also special categories of data as defined in Article 9(1) GDPR. There are two exceptions.
First, 'age' and 'gender' are protected characteristics in non-discrimination law, but are not special categories of data in the sense of the GDPR. 37 Second, 'political opinions', 'trade union membership', 'genetic' and 'biometric' data are special categories of data but are not protected by the European nondiscrimination Directives. 38 We summarize the distinction between the 'special categories of data' and 'protected non-discrimination grounds' in Figure 1. 37 We add a caveat. In some circumstances, age and gender could be special categories of data if they were broadly interpreted under special categories 'health data', 'biometric data' or 'genetic data'. See M Van   The GDPR's prohibition to process special category can hinder the prevention of discrimination by AI systems. 39 Think about the following scenario. An organisation uses an AI-driven recruitment system to select the best applicants from many job applications. The organisation wants to check whether its AI system accidentally discriminates against certain ethnic groups. For such an audit, the organisation requires data concerning the ethnicity of the job applicants. Without such data, it's very difficult to do such an audit. However, Article 9(1) of the GPPR prohibits using such ethnicity data. Article 9(1) not only includes explicit information about a data subject's ethnicity, but also information 'revealing' ethnicity. Hence, the organisation is not allowed to infer the ethnicity of its applicants either. 40 The GDPR contains exceptions to the ban; we discuss those in the next section. Some non-discrimination scholars appear to suggest that data protection law does not hinder collecting or using special categories of data to fight discrimination. 42 One scholar suggests that there is a 'need to "myth bust" the notion that data protection legislation should preclude the processing of equality data.' 43 However, we did not find detailed argumentation in the literature for the view that the GDPR allows using special categories of data for non-discrimination purposes. Below we show that, in most circumstances, the GDPR does not allow the use of special category data to fight discrimination.

Explicit consent
We discuss each exception to the ban on processing special category data in turn, starting with the data subject's consent. The ban does not apply if '(a) the data subject has given explicit consent to the processing of those personal data for one or more specified purposes (…).' 44 In short, the data subject's explicit consent can lift the ban. However, the requirements for valid consent are very strict. 45 Valid consent requires that consent is 'specific' and 'informed', and requires an 'unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.' 46 Hence, opt-out systems cannot be used to obtain valid consent.
Moreover, Article 4(11) GDPR prescribes that consent must be 'freely given' to be valid. Valid consent thus requires that the consent is voluntary. The GDPR's preamble gives some guidance on interpreting the 'freely given' requirement: 'Consent should not be regarded as freely given if the data subject has no genuine or free choice or is unable to refuse or withdraw consent without detriment.' 47 Consent is less likely to be voluntary if there is an imbalance between the data subject and the controller. The preamble says that 'consent should not provide a valid legal ground for the processing of personal data in a specific case where 44 Article 9(2)(a) GDP. 45 See article 4(11) and 7 GDPR. 46 Article 4(11) and 7 GDPR. 47 Recital 43 GDPR.
there is a clear imbalance between the data subject and the controller'. If there is a 'clear' imbalance, consent is not freely given and thus invalid.' 48 The European Data Protection Board says that consent from an employee to an employer is usually not valid: 49 An imbalance of power also occurs in the employment context. Given the dependency that results from the employer/employee relationship, it is unlikely that the data subject is able to deny his/her employer consent to data processing without experiencing the fear or real risk of detrimental effects as a result of a refusal. 50 The Board adds: [T]he EDPB deems it problematic for employers to process personal data of current or future employees on the basis of consent as it is unlikely to be freely given. For the majority of such data processing at work, the lawful basis cannot and should not be the consent of the employees (Article 6(1)(a)) due to the nature of the relationship between employer and employee. 51 What does this mean for using special category data for preventing AI-driven discrimination? In many situations this 'freely given' requirement poses a problem. We return to our example: an organisation uses AI to select the best 48 See on an imbalance between a data controller and a data subject also job applicants, and wants to audit its AI system for accidental discrimination.
The organisation might consider asking all job applicants for consent to collect data about their ethnicity, to use that information for auditing its AI system. However, as discussed, applicants could fear that they would be rejected because of refusing to share their special category data. The applicant's consent is therefore generally invalid.
Perhaps a system could be designed in which a job applicant can give genuinely 'freely given' and thus valid consent. For instance, an organisation could ask all rejected job applicants for their consent after the position has been filled. In that case, job applicants might not fear anymore that withholding consent diminishes their chances to get the job. However, the organisation might find it awkward to ask people about their ethnicity, religion, or sexual preferences. Moreover, people can refuse their consent. If too many people refuse, the sample will not be representative -and cannot be used to audit AI systems.
We gave one specific example where gathering consent would be noncompliant. In many other contexts, particularly those where no power relationship exists between data subject and data controller, the data subject's consent may provide an exception to collect data regarding e.g. ethnicity. 52 All in all, in certain circumstances, organisations could rely on consent to lift the ban. But in many circumstances, the data subject's consent seems not a real possibility.

Other exceptions
We continue with the other exceptions to the ban on processing special category data, starting with (b), on employment and social security and social protection law. The ban can be lifted if: 52 Makonnen writes: 'it is subparagraph a on the consent of the data subject which is likely to become the most frequently used basis for processing sensitive data.' Makkonen (n 43) 28. Roughly summarized, that provision allows employers to collect health data of employees if that is necessary for e.g. their re-integration after an illness. In a case about this provision, the Dutch Data Protection Authority stated that an employer must regularly re-consider if gathering the health data is still truly necessary in the light of the employee's re-integration duty. This requirement followed from a restrictive interpretation of Art. 9(1)(b) GDPR. 55 Could this GDPR provision help organisations that want to audit their AI systems for discrimination? No. The main problem is that exception (b) only applies if the EU lawmaker or the national lawmaker has adopted a specific law that enables the use of special category data. To the best of our knowledge, no national lawmaker in the EU, nor the EU, has adopted a specific law that 53 Article 9(2)(b) GDPR. 54 Art. 30 Uitvoeringswet Algemene Verordening Gegevensbescherming (UAVG): 'In view of Article 9, Section 2, under b, of [the GDPR], the prohibition on processing data concerning health does not apply if the processing is carried out by administrative bodies, pension funds, employers or institutions working on their behalf, and in so far as the processing is necessary for: (a) the proper execution of statutory provisions, pension schemes or collective agreements providing for entitlements which depend on the state of health of the person concerned; or (b) the reintegration or assistance of employees or beneficiaries in connection with illness or disability. ' Translation by the authors of this paper. 55  enables the use of special category data for auditing AI systems. 56 We turn to the next possibly relevant exception in the GDPR: the ban does not apply if (f) processing is necessary for the establishment, exercise or defence of legal claims or whenever courts are acting in their judicial capacity.
Exception (f) applies to 'legal claims'; hence, claims in legal proceedings such as court cases. The exception applies to the use of personal data by courts themselves. An organisation might argue that it needs to audit its AI systems to prevent future lawsuits because of illegal discrimination. However, Kuner & Bygrave note that exception (f) is only valid for concrete court cases. The exception 'does not apply when sensitive data are processed merely in anticipation of a potential dispute without a formal claim having been asserted or filed, or without any indication that a formal claim is imminent.' 57 It seems implausible that an organisation can use it to collect special categories of data about many people to audit its AI systems preventatively. Could exception (g) and (j) help organisations that want to debias their AI systems? The two provisions require a legal basis in a law of the EU or of a national lawmaker. 58 Hence, the provisions do not allow processing of special categories of data. Instead, exceptions (g) and (j) give flexibility to the EU or member states to decide whether processing special category data for fighting discrimination is allowed. 59 Current EU law, nor national law, provide such an exception.
To fight discrimination under exceptions (g) and (j), national law must provide a legal ground for processing. To the best of our knowledge, the United Kingdom was the only member state to have ever adopted an exception. (Now the UK is not an EU member anymore). The UK exception is based on Article 9(2)(g), the 'substantial public interest' exception. 60 However, the EU nor the current member states have adopted such a law. (The proposed AI Act contains such a provision; we discuss that in section 7.1).
During the drafting of the GDPR, the Fundamental Rights Agency (FRA), a European Union agency, realised that the (proposed) GDPR did not allow such data collection. The Agency said that '[the GDPR] could clarify the place of special category categories in anti-discrimination data collection, and make explicit that the collection of special category data is allowed for the purpose of combatting discrimination'. 61 But the EU did not follow that suggestion.
For completeness' sake, we highlight that an organisation is not out of the woods yet if it has found a way to lift the ban on using special category data. For the processing of personal data, sensitive or not, the GDPR requires a 'legal ground'. Hence, even if the ban on using special category data could be lifted, the organisation must still find a valid legal processing ground as defined in Article 6(1) GDPR. For non-state actors, the legitimate interest ground (Article 6(1)(f) GDPR) seems the most plausible. The organisation must also comply with all the other requirements of the GDPR. But a discussion of all those GDPR requirements falls outside the scope of this paper.
In conclusion, the GDPR indeed hinders organisations who wish to use special category data to prevent discrimination by their AI systems. In some exceptional situations, an organisation might be able to obtain valid consent from data subjects for such use. In other situations, an EU or national law would be needed to enable the use of special categories of data for AI debiasing -at the moment such laws are not in force in the EU. In the next section, we explore whether such an exception is a good idea. 6 A new exception to the ban on using special categories of data?

Introduction
Policymakers have realised that non-discrimination policy can conflict with data protection law, and some have adopted exceptions. As noted, the UK has adopted an exception to the ban on using special categories of data, for the purpose of fighting discrimination. 62 The Dutch government is considering to create a new national exception for collecting special categories of data. 63 We In the following section, we map out arguments in favour and against creating an exception for gathering special categories of data for the purpose of auditing an AI system for discrimination.

Arguments in favour of an exception
We present two main arguments in favour of creating a new exception that enables the use of special category data to prevent AI-driven discrimination: (i) Several types of organisations could use the data to test whether an AI system discriminates. (ii) The collection of the data has a symbolic function.
A first, rather strong, argument is that, for many types of stakeholders, collecting special category data would make the fight against discrimination easier. Organisations could check, themselves, whether their AI system accidentally discriminates. Organisations may want to ensure that their hiring, firing and other policies and practices comply with non-discrimination laws. Organisations may care about fairness and non-discrimination, or may want to protect their reputation. 65 Regulators, such as equality bodies (non-discrimination authorities) could also benefit from an exception that enables the use of special categories of data for AI debiasing. Regulators could more easily check an organisation's AI practices if those organisations registered the ethnicity of all their employees, job applicants, etc. 66 Another group that can benefit from the collection of special category data is researchers. 67 Researchers could use such data to check whether an AI system discriminates. This argument is only valid, however, if an organisation shares its data with the researcher.
A different type of argument in favour of allowing the use of special category data is related to the symbolic function of such use. 68 Auditing an AI system can increase the trust in an organisation's AI practices, if it is publicly known that the organisation checks whether its AI systems discriminate. Potential discrimination victims can see that the organisation takes discrimination by AI seriously. 69 We mention this argument for completeness' sake. However, we do not see this as a particularly strong argument. In sum, there are various arguments in favour of adopting an exception that enables the use of special category data to prevent discrimination by AI. 65 See also Makkonen (n 43) 21. 66 Makkonen: 'National specialised bodies, such as ombudsmen and equality bodies, and international monitoring bodies, such as the UN treaty bodies and the Council of Europe's European Commission against Racism and Intolerance (ECRI), as well as some other institutions, such as the EU Fundamental Rights Agency, need quantitative and qualitative information in order to perform their functions properly.' ibid 20. 67 ibid 21. 68 Similar to an argument made by Makonnen: '[...] the compilation of equality statistics can be seen to have more symbolic functions. The mere existence of a data collection system sends a message to actual and potential perpetrators, actual and potential victims and to society in general, signalling that society disapproves of discrimination, takes it seriously and is willing to take the steps necessary to fight it. This can have a preventive effect.' ibid. 69 See for a similar argument for collecting non-discrimination data in general Alidadi (n 39) 18.

Arguments against an exception
There are also strong arguments against adopting an exception that enables the use of special category data for AI debiasing. Balayn & Gürses warn for the danger of surveillance of protected groups: 'Policy that promotes debiasing (…) may incentivise increased data collection of exactly those populations who may be vulnerable to surveillance.' 70 We distinguish three categories of arguments against introducing a new exception to enable the use of special categories of data to mitigate discrimination risks. There are arguments (i) that concern the mere storage of special categories of data, (ii) that concern new uses of those data, and (iii) that show practical hurdles as a reason why an exception is not justified at this time. regardless of how those data are used: 'even the mere storing of data relating to the private life of an individual amounts to an interference within the meaning of Article 8' of the European Convention on Human Rights, which protects the right to privacy. 74 In sum, the two most important courts in Europe accept that merely storing personal data can interfere with fundamental rights. Indeed, we think that many people may dislike it when special category data about them (such as their ethnicity) are stored. Third, organisations could misuse the exception to collect large amounts of special categories of data, claiming that they need such data to fight discrimination. An exception that is too wide could open the door for mass data gathering. As Balayn and Gürses note, 'the possibility that debiasing methods may lead to over-surveillance of marginalised populations should be a very serious concern.' 79 Fourth, a symbolic argument can be made against an exception that allows the collection and storage of special category data. People want their sensitive data to be handled with care. If people know that an organisation does not collect their special category data, they could trust that organisation more. (As with the symbolic argument in favour of using special category data, we do not think this argument is very strong.) In sum, there are several arguments against adopting an exception that enables using special categories of personal data to prevent discrimination by AI.
Finally, there is a different category of arguments against adopting a new exception to enable collecting and using special categories of data for AI nondiscrimination auditing. The world seems not yet ready for such an exception, as auditing AI systems is still very difficult. A 2021 interview study found several practical reasons explaining why industry practitioners themselves find it difficult to test if an AI system discriminates. For instance, the collected special category data may not be accurate, because the data may have originally been collected by another party, or for a different purpose. And because data subjects may report their own (self-perceived) ethnicities, the data could be inaccurate or unusable for the test. 82 An interview study from 2019 found that many industry practitioners did not have an infrastructure in place for collecting accurate special categories of data. Furthermore, practitioners mentioned that auditing methods themselves are not holistic enough, and practitioners do not always know which subpopulations they need to consider when auditing an AI system for discrimination. 83 Since 'best practices' for auditing an AI system for discrimination seem to be in their infancy, it is questionable whether creating a new exception is currently justified. However, the techniques for auditing and debiasing AI are improving. The more knowledge exists about how to test an AI system for discrimination, the more justified a new exception could become in the future.
All in all, there are various arguments in favour of and against adopting an exception that enables the use of special categories of data to prevent AI-driven discrimination. The balance between the pro and contra arguments is difficult to find. If such an exception were adopted, the exception should also include safeguards to minimise risks. We discuss possible safeguards in the next section.
7 Possible safeguards if an exception were adopted

Safeguards in the proposed AI Act
A proposal by the EU illustrates some possibilities for safeguards. Early 2021, the European Commission presented a proposal for an AI Act, with an exception to the ban on using special category data. The proposed exception is phrased as follows: 84 To the extent that it is strictly necessary for the purposes of ensuring bias monitoring, detection and correction in relation to the high-risk AI systems, the providers of such systems may process special categories of personal data referred to in There are various safeguards in the proposed provision that aim to prevent abuse of the special category data.

Strictly necessary for preventing discrimination
In the AI Act, the exception to the ban on using special category data only applies '[t]o the extent that it is strictly necessary for the purposes of ensuring bias monitoring, detection and correction in relation to the high-risk AI systems'.
The phrase 'strictly necessary' implies a higher bar than merely 'necessary'. The word 'necessary' is already quite stern. The CJEU said in the Huber case that 'the concept of necessity (…) has its own independent meaning in Community law.' 85 And necessity 'must be interpreted in a manner which fully reflects the objective of [the Data Protection] directive'. 86 CJEU case law shows the word 'necessary' must be interpreted narrowly, in favour of the data subject: 'As regards the condition relating to the necessity of processing personal data, it should be borne in mind that derogations and limitations in relation to the protection of personal data must apply only in so far as is strictly necessary (… sum, organisations should only rely on the exception in the AI Act if using special category data is genuinely necessary.

The AI Act exception only applies to providers of high-risk AI systems
The AI Act's exception only applies to high-risk AI systems. High-risk AI systems can be divided into two types: First, products already covered by certain EU health and safety harmonisation legislation (such as toys, machinery, lifts, or medical devices). Second, AI systems specified in an annex of the AI Act, in eight areas or sectors. 88 If lawmakers consider creating a new exception to enable the use of special category data, they could limit the scope of the exception in a similar way.
Perhaps organisations should not be allowed to rely on the exception if the AI system does not bring serious discrimination risks.

Appropriate safeguards
The exception in the proposed AI Act says that special category data can be used to prevent AI-driven discrimination, 'subject to appropriate safeguards for the fundamental rights and freedoms of natural persons'. 89 The provision gives examples of such safeguards: 'including technical limitations on the reuse and use of state-of-the-art security and privacy-preserving measures, such as pseudonymisation, or encryption where anonymisation may significantly affect the purpose pursued.' 90 If an exception were adopted to enable the use of special category data for fighting AI-driven discrimination, a similar requirement should be included. Some elements in the proposed AI Act exception are controversial. For instance, the current exception leaves unclear who decides what the appropriate safeguards are. With the current text, that burden seems to rest on the provider of the AI system him or herself. Therefore, the exception seems to leave the exact safeguards up to the provider of the AI system. Moreover, as Balayn & Gürses note, the 'European Commission proposal to regulate AI enables the use of sensitive attributes for debiasing, without further consideration of the risks it imposes on exactly the populations that the regulation says it intends to protect.' 91 Indeed, the AI Act does not include measures to limit the risks associated with collecting special category data.

Other possible safeguards
Are other safeguards, not mentioned in the AI Act, viable? A specific technical safeguard is to create a synthetic, anonymous dataset from the 'real' dataset.
The fake dataset represents the same (or a similar) distribution of individuals but can no longer be linked to the individuals. Such data can be safely stored.
The original special categories of data still need to be collected to create a synthetic dataset, but the original data can be stored for less time. It is controversial whether and when using such a dataset is effective for testing AI systems. At the current time of writing, the privacy gain seems to vary greatly, and it is unpredictable how much utility is lost by making the dataset synthetic. 92 Other possible safeguards are more organisational. For example, a trusted third party could collect the special categories of data, store them and use them for auditing the AI system. The organisation using the AI system then no longer needs to store the data itself. However, it seems debatable if such a construction is practical, and financially feasible. 93 Such organisational safeguards raise many questions. For example, which third party can be trusted with special categories of data? One possibility might be the national Statistics Bureaus of member states. Such bureaus could store and manage the special category data safely. In Europe, statistics bureaus have long been responsible for collecting special categories of data for statistical purposes, on a large scale. Or perhaps an independent supervising authority could appoint trustworthy researchers and give them access to the special categories of data, to prevent discrimination by AI systems. Specific privacy-related solutions also exist that might allow a third party to more safely use the data. 94

Conclusion
In this paper, we examined whether the GDPR needs a new exception to the ban on using special categories of data, such that an organisation can mitigate discrimination by artificial intelligence. We mapped out the arguments in favour of and against such a new exception.
We presented the following main arguments in favour of such an exception. (i) Organisations could use the special category data to test AI against discrimination. (ii) AI discrimination testing could increase the trust consumers have in an AI system. 93 See also Niki Kilbertus and others, 'Blind Justice: Fairness with Encrypted Sensitive Attributes' [2018] arXiv:1806.03281 [cs, stat] 1-2 <http://arxiv.org/abs/1806.03281>. 94 To illustrate, say a statistics bureau audits an organisation's patented AI system. If both organisations use secure multi party computation (MPC), the statistics bureau cannot see the model of the organisation it audits and the organisation cannot see the special category data used for the audit, but they could audit the AI system together. A caveat is that MPC creates some overhead in calculations and is complex to set up. See personal data about, for instance, ethnicity can be seen as a privacy interference. (ii) Such data can be abused, or data breaches can occur. (iii) The exception could be abused to collect special categories of data for other uses than AI discrimination testing. (iv) In addition, merely allowing organisations to collect special category data does not guarantee that organisations can debias their AI systems. Auditing and debiasing AI systems remains difficult.
In the end, it is a political decision how the balance between the different interests must be struck. Ideally, such a decision is made after a thorough debate. We hope to inform that debate.