1 Introduction

Forms of machine learning and artificial intelligence have already started to be employed in smart cities as well as in policing, crime prevention and security (Mattern 2021, Powell 2021). AI is defined by Bryson and Winfield (2017:116) as ‘artefacts that demonstrate’ the capacity to perceive context for action, to act, to associate contexts to actions and/or are cognitive. There is a wide range of computational methods and applications within the field of policing that use machine learning, perhaps the two most well-known examples to date are predictive policing and facial recognition. As Babuta and Oswald (2019:4) note, often such technologies are—poorly—being referred to as AI. However, there is a relatively new branch of AI that is aiming to recognise human emotions and, in some cases, to infer intentions and to predict future behaviour. Such a technology may, therefore, attract the interest of police and law enforcement authorities. This area is known as affective computing,’computing that relates to, arises from, or influences emotions’ (Picard 1995:1). I limit the scope of this paper to the potential applications of emotional AI in policing publicly available urban space. Therefore, I will not explore other potential applications of affect recognition that concern crime, its prevention or its detection, such as identification of victims of domestic violence or bullying based on analysis of social media content. Concepts such as privacy, security, control and public space may be defined and understood differently in different cultures and societies. Similarly, different jurisdictions have different policing traditions and practices as well as cultural attitudes towards policing. For this reason, I write this paper from the perspective of Western liberal democracies, though the generic issues discussed are in theory applicable globally.

Emotional AI is an umbrella term for any technology that uses affective computing and artificial intelligence to make an assessment or a prediction about a person’s emotional state or feelings based on data such as ‘words, pictures, intonation, gestures, physiology and facial expressions’ (McStay 2019a:1). To achieve this, techniques and environments such as sentiment analysis, facial coding of expressions, voice analytics, eye-tracking, measuring skin temperature and other physiological states via wearable tools, analysing gestures and behaviour, virtual reality and augmented reality are usedFootnote 1. These technologies have been applied in relation to a number of different fields, such as education (Williamson 2017; McStay 2019a), political campaigns (Bakir 2020), social media (Ortigosa et al. 2014; Wang et al. 2014; Yue et al. 2019; Stark 2018) and distribution of information in digital media (Bakir and McStay 2020) or from the perspective of privacy (McStay 2020).

From a criminological perspective the topic of smart cities and urban AI raises questions and suggests linkages with research on policing and surveillance. Cities are often associated with the problems connected to crime and have similarly been the focus of policing and surveillance activities. In the context of policing, cities are the paradigmatic setting both for street crime and perceived anti-social behaviour but also for public assembly and political protest. In liberal countries the history of cities has also long been the history of a struggle between individual rights and state/city control over public spaces (Harvey 2012). Smart technologies deployed in cities have been subjected to criticism on the grounds of ‘data ethics, privacy, mass surveillance, commodification and social control’ (Foth et al. 2021:319) as well as becoming a form of capital and contributing to increasing control over people’s lives through data gathering and analysis (Sadowski 2020). In recent years electronic surveillance technologies of public space such as CCTV have been implemented in this environment. Here, the task of everyday surveillance is ‘increasingly being performed by non-human algorithms’ (Smith 2020:1). The topic of this paper is an emerging new form of surveillance in public urban spaces, namely what has been termed as ‘emotiveillence’ where people’s emotional states may be subject to surveillance and their intentions may be inferred ‘usually for the purposes of influencing and managing people’ (McStay 2016:151). The idea of monitoring the emotions of people in public urban spaces has been suggested before (Guthier et al. 2014; Zeile et al. 2015, Cabalcanti Roza and Postolache 2016, Adikari and Alakahoon 2021) but the particular risks and implications of potential policing applications have not previously been examined in detail. In this article I will engage with this new technology, and with the debates and literature that surround it. Working at the intersection of criminology, policing, surveillance and the study of emotional AI this paper explores and offers a framework of understanding the various issues that these technologies present particularly to liberal democracies. I argue that these technologies should not be deployed within public spaces because there is only a very weak evidence-base as to their effectiveness in a policing and security context, and even more importantly represent a major intrusion to people’s private lives and also represent a worrying extension of policing power because of the possibility that intentions and attitudes may be inferred. Moreover, I argue that even if these technologies will be proven accurate in recognising emotions and intentions, they still raise the question as to whether they will ever be desirable in liberal democracies that place an emphasis on the privacy and freedom of personal thoughts, feelings and emotions. Further to this, the danger in the use of such invasive surveillance for the purpose of policing and crime prevention in urban spaces is that it potentially leads to a highly regulated and control-oriented society. Smith (2020) notes that forms of urban surveillance, such as automated facial recognition have severe impacts on the right to the city. I argue that emotion recognition takes this notion a step further by not only undertaking surveillance of existing situations but also making inferences and probabilistic predictions about future events as well as emotions and intentions.

I aim to contribute towards understanding of the implications, challenges and limitations of using emotional AI technologies in the field of policing. In order to achieve this, first I provide an overview of the current state of existing applications of emotion recognition technologies, the longstanding tradition of using biometrics in policing and security, how AI is used in policing today and of how the use of cameras has become more prominent in the policing of urban spaces. Then I identify three areas of emotion recognition in which policing and security applications have been either deployed, trialled or prototyped. Finally, I discuss key issues and implications of the use of such technologies related to accuracy, algorithmic bias, accountability, privacy and other human rights. In conclusion, I argue that by understanding the capabilities and limitations of such technologies as well as the societal costs they potentially introduce, as they currently stand, they should not be deployed in policing urban spaces.

2 Biometrics, AI and cameras in policing

We have already witnessed attempts to pioneer the use of certain technologies that belong to the field of emotional AI. Albeit not on a large scale, such technologies are already in use in recruitment to assess candidates (Bogen et al. 2018:32,36), in advertising to understand consumer preferences, in education to examine student behaviour, in healthcare to track mental states, and in social media to profile individuals and to examine group behaviour (McStay 2018). In 2018 Gartner, a leading research and advisory company in the field of technology predicted ‘that by 2022 10% of personal devices will have emotional AI capabilities’ compared to ‘less than 1% in 2018’Footnote 2. We have yet to see whether this prognosis becomes reality or not, especially in light of the emergence of critical voices regarding emotional AI technologies and their legitimacy, accuracy, and effect on privacy and freedom of thought as well as on individual autonomy and human dignity (see for example Chen 2019, Feldman Barrett et al. 2019, Dupré 2020, Article19 2021, Stark and Hoey 2021, Valcke et al. 2021).

In order to understand the environment in which emotion recognition tools may be applied I will briefly visit three topics. First, I situate this new technology within a longer history of using biometrics for investigation, identification and inferring criminal behaviour and contrast it with previous practices. Secondly, I show how the use of cameras have become a widely applied method in policing and thus, may provide a platform within which to integrate emotion recognition technologies. Thirdly, I provide a brief summary of some use cases of AI that has been deployed or trialled in a policing context.

2.1 Use of biometric data in policing and security

As was noted above, emotional AI uses data derived from people’s voices, body movements, intonation, gestures and facial expressions. Biometrics have been used in policing and have appeared in criminological theories since the late nineteenth century. In this, two main (and quite different) directions of their potential uses have crystallised. The first is where they are used purely for identification, such as the use of fingerprints, DNA or matching a face to another in order to determine if it is the same face. These uses have become everyday practice in policing and in collecting and presenting evidence in criminal cases, even though they have been subject to criticism regarding their accuracy and ability to provide proof beyond reasonable doubtFootnote 3. Such uses of biometric data (see for example Datta 2020) have brought privacy concerns with them as well, for instance regarding the loss of anonymity, the risk of losing control over sensitive information (Kindt 2013), that it can be used to generate extra information that was not the purpose of the collection or that it can be used to track people (Campisi 2013). The second use of biometrics has emerged amongst criminological theories where scholars have tried to use body measurements to identify ‘criminal traits’ in people with potentially being interested in predicting criminality in people who possess said ‘criminal traits’ (see for example Lombroso 2006[1876], Lombroso and Ferrero 2004[1893], Kretchmer 1936[1921], Hooton (1939a, 1939b) and Sheldon 1940). These attempts, however, have been ridden with logical and methodological fallacies, have never presented reliable evidence for the claimed deterministic connection between physiology and criminal behaviour, and thus never have become mainstream theories (see Goring 1913, Merton et al. 1940, Horn 2003, Siegel 2012). Consequently, we need to draw a clear distinction between using biometrics for identification, on the one hand, and as crime explanation and/or for crime prediction, on the other. Given this distinction it appears that emotional AI would be used for the latter as these technologies are used to make probabilistic predictions about emotional states and personality traits. The technology emotional AI promises us is however slightly different from the previous uses of biometric data as its predictions regarding emotional states would not derive from the assessment of a static biological or psychological trait but from a set of dynamic behaviours which would be used to conclude that the person in question was currently in a particular emotional state or had a certain intent. As was briefly described above, in the long history of criminological theories and forensics, the use of biometric data has been present in the past few centuries so it would not be without precedent to again make such attempts with the deployment of emotional AI technologies.

2.2 Use of cameras in policing

CCTV cameras and the use of big data in policing is already a reality, and it is easy to see how emotional AI technologies can be included in cameras or the use of open-source databases and social-media platforms for example. Facial recognition technologies are also on the rise, and are commonly deployed at borders, during police checksFootnote 4 and in some countries they are reported to have been embedded into CCTV cameras as wellFootnote 5. According to a definition provided by tech-giant Amazon, facial recognition technologies refer to a set of computational methods which identify whether and where ‘faces exist in an image or video’ and ‘what attributes those faces have’Footnote 6. This is being used by police forces to identify individuals, for instance to compare an image taken from a CCTV camera and to compare it to a database of known offenders or missing peopleFootnote 7. However, recent research conducted in the UK came to question its accuracy (Fussey et al. 2021). The application of police body-worn camerasFootnote 8 (Ariel et al. 2017, Bowling et al. 2019, Lee et al. 2019, Jones 2021) and the discussions about using UAVs (aerial ‘drones’) for police tasks such as intelligence gathering or looking for missing peopleFootnote 9 when combined with facial recognition would create even more powerful identification systems. This, however, introduces concerns regarding large-scale surveillance and human rights. Automated facial recognition not only enhances the abilities of a CCTV system to ‘gather private information on’ people but also can affect ‘the right to freedom of assembly, freedom of thought belief and religion, the freedom of expression and the freedom of association’ (Surveillance Camera Commissioner 2020:12,20). The embedding of emotional AI technologies is arguably able to amplify these effects and would introduce ‘emotiveillance’ (McStay 2016:151) in public spaces on a mass-scale.

2.3 Current use of AI in policing

The two most prominent areas of the use of AI in policing at the moment are predictive policingFootnote 10 and the above-mentioned facial recognition. Predictive policing has been defined by Meijer and Wessels (2019:1033) as ‘the collection and analysis of data about previous crimes’. These methods can be applied in order to ‘identify places and times with the highest risk of crime’, individuals who are ‘at risk of being offenders or victims’ or ‘people who most likely committed a past crime’ (RAND 2013:1). The use of machine learning in predictive policing have led to the development of more advanced systems but it also raised several legal and ethical questions, for instance about system bias (Babuta et al. 2019). Live facial recognition have been subject to criticism for racial bias and discrimination as well as for potential negative impact on privacy, freedom of expression and freedom of assembly (Fussey et al. 2019). Similarly, predictive policing tools have been shown to be biased (Ferguson 2017) and has been shown to (re)produce ‘racialized subjects and spaces’ (Jefferson 2018:12). It was also demonstrated that machine learning in predictive policing can lead to the ‘disproportionate policing of historically over-policed communities’ which may result in discrimination as well as in issues regarding accountability for decisions (Lum et al. 2016). One key reason for these effects was shown to be inaccurate or skewed input data which may have been recorded while the implementation of racially biased or unlawful policing practices (Richardson et al. 2019).

3 Emotional AI and crime

For the purposes of this paper, I will separate the current main experimental directions of research and deployment of emotional AI technologies related to crime into the following two distinct types based on the general aim of the tool: (1) Crime prediction or identification; (2) Deception detection. Within the category of crime prediction and identification it is possible to distinguish between two sets of technologies based on what type of data they gather and assess: (a) Behavioural cues; (b) Verbal cues.

3.1 Predicting crimes

Predicting crimes, whether it is done by an algorithmic system or a human, and whether it is using a person’s facial expressions or their past criminal activity as predictors is always a complex subject. The reason for this is the fact that crime predictions happen with a particular aim in mind and that goal is crime prevention (Meijer et al. 2019). The suggestion behind these efforts is that if we can successfully predict when, where, by whom and against whom or what object crime is going to be committed, at least some parts of crime prevention become straightforward as it provides the police with targets for intervention (Perry et al. 2013:1, Kaufmann et al. 2019). Sometimes, such as in cases of predictive policing this means to prevent crime from happening under particular circumstances (Perry et al. 2013:17). Other times the prediction is about a particular person who previously committed a crime and a risk assessment determines whether there is a risk of re-offending which then contributes to the decision on what punitive measures to employ in that person’s case (Northpointe 2015). It is important to recognise that these determinations are based entirely on risk and probability, and therefore, the crime predictions made using these methods can, at most, let us know only about risks and are not definitive certainties concerning future events (Strikwerda 2020:423). It is worth noting that many tools that are being referred to by the umbrella term of being ‘predictive’ perform classification rather than prediction in the traditional sense and the risk assessment step is performed by a human actor (Babuta et al. 2020:11–12). Nevertheless, the process as a whole is predictive in the sense that action is taken a priori based on probabilistic assessments of future events which may never occur. Such phenomena is characterised by Zedner (2007) as pre-crime which runs up against fundamental principles embedded in criminal law and criminal justice processes in which the assumption of innocence is paramount. Policing of pre-crime is consistent with the concept of risk society which is built on the minimisation of risks (Beck 1992:19) where factors that are deemed potentially dangerous are eliminated before crimes can manifest. State reactions become determined by statistical probabilities (Swaaningen 1997:174). As Zedner (2015) further suggests, essential problem is that someone who was merely deemed as carrying a risk of future offending may then be subjected to restrictions or punishment (‘pre-punishment’) as a consequence which are contrary to established principles of justice.

3.1.1 Crime prediction/identification based on behavioural cues

When we talk about the use of AI to identify behavioural cues in a crime context, we have to distinguish between technologies that help identifying existing criminal behaviour (for instance acts that have already resulted in violence against people or property) and tools that aim to identify emotions based on facial or behavioural clues that may indicate the possibility of future offending. We do not only have to make this distinction for theoretical reasons but also because these applications potentially call for vastly different methodologies, have different legal consequences, and affect procedural guarantees of law enforcement and the criminal justice system in different ways. For example, if the system output identifies a case of a violent crime such as assault, this result is easier to verify by human actors in law enforcement and/or in the criminal justice system than if the output is a prediction regarding a person’s supposed violent intent. We also need to note that as emotional AI is an umbrella term for technologies that use different types of data as input (McStay 2019a:1), accuracies and concerns may vary vastly depending on the data type.

While it seems logical to assume that there are strong correlations between emotional states and intention (Tistarelli et al. 2012), it is not known at this point whether it is possible to make definitive statements about people’s emotional states based on their behaviour. The idea of being able to predict crime in real-time based on behavioural cues and/or patterns originates in the idea that there are certain signs in the behaviour of offenders that precede the commission of a particular type of criminal activity, and that these signs can be identified—either by human observation or machine learning—and then can be used to prevent said types of crimes. However, a report on the practices of the Transportation Security Administration (TSA) that operates in the USA indicated that stranger-to-stranger behaviour detection and behaviour detection by human actors does not seem to live up to such expectations (ACLU 2017). Similarly, research on ‘stop and searches’ by human police officers show relatively low percentages when it comes to these actions being followed by arrests or findings of illegal substances or possessions such as firearms (Home Office 2020:2, Ferrandino 2013). These findings suggest either the lack of ability to assess behavioural cues and/or the presence of discrimination and prejudice. What we do not know at this point is whether machine learning and artificial intelligence will ever be able to find a correlation between observable behavioural cues and true intention and whether these correlations will be strong enough to serve as bases for police action and other security measures. Existing studies that examine the possibilities of connecting the identifications of emotions to crime prediction (see for example Li et al. 2021) are scarce. It appears that the area needs a more solid and diverse empirical basis for the classification of emotions (Du et al. 2015) on whether classifying emotions universally is even possible considering the heterogeneity of cultures and personal habits (Jack et al. 2012), and on whether facial expressions can really tell us about a person’s emotionsFootnote 11. Moreover, there is a further need for empirical evidence on whether the correlations between the portrayal of a particular emotion and criminal intent is definitive enough for the application of intrusive measures such as stop and searches that belong to the field of law enforcement and similar areas.

Identifying crimes-in-progress for timely intervention, investigation, the prevention of further crimes—as stated above—differs from pure crime prediction as here we are dealing with acts that have already happened and the role of emotional AI here could be to differentiate between, for example, a violent robbery and boxing match based on behavioural cues portrayed by the participants. When it comes to the use of emotional AI technologies, two main directions of these efforts seem to be forming in the present time. One is the observation of individual behaviour and interactions between individuals in groups, the other is the observation of crowd behaviour.

The observation of individuals for the purpose of determining whether they are committing a crime or have the intention to commit a crime has long been a part of everyday police patrol operations and other armed forces that are tasked with ensuring the safety of the public, individuals and property (Dahl 1952). During these patrols the police may successfully ‘stumble’ upon criminals in action but as it was pointed out above, detection of suspicious behaviour seems to lack real accuracy. When it comes to the use of machine learning, attempts have been made to identify suspicious behaviour in a specific context, such as the use of cashpoint/ATM machines (Lee et al. 2018) or to detect specific crimes, such as shoplifting (Martínez-Mascorro et al. 2020). One such example is a proposed real-time drone surveillance system which in order to detect violent individuals in public spaces would use aerial images from a drone to estimate human poses and identify which of these seem to be portraying violent behaviour (Singh 2018). This technology is not so much aimed at predicting a crime as at detecting it. But it may not be far-fetched to expect different types of technologies to emerge in its footsteps, ones that instead of merely noticing violent behaviour attempt to predict it, for example by detecting aggression.

Another example in this area is the emotion recognition technology offered by VibraImage. In a nutshell, the technology promises to measure micromovements of a particular person and based on that ‘detect all human emotions’Footnote 12. The technology is used in certain airports in RussiaFootnote 13 but is has also been deployed in different contexts in countries such as China, Japan and South Korea (Wright 2021). However, as Wright (2021) discovered, details on the technology, training data, relationship between input and output were never properly described or disclosed in any meaningful way. The opaque nature of this tool is enhanced by the fact that it seems that every study or paper that describes its operations simply assumes the fact that the underlying technology is working (Wright 2021).

Crowd control is an important function of the police (Hoggett and Stott 2010; Stott and Kumar 2020). Understanding and assessing the nature of crowd behaviour is a crucial element of designing a police response (Bosch 2013) which could range from tolerant and consensual policing to controlling and repressive. The form of police strategy adopted may further have implications and consequences for police–public relations, trust in the police and perception of police legitimacy. Even though nowadays the type of event where crowd control receives the most attention is protests, it can be a feature of any larger gatherings such as sports, concerts or celebrational public events. However, the public and media attention is not accidental as ‘protesting crowds have their own dynamics’ where various different groups from organisers through businesses to security services interact with each other (Neufeld Redekop 2010). In the past decade there have been several studies on crowd behaviour analysis (Sánchez and Dencik 2020) on the topics of creating systems that can recognise anomalies or ‘abnormalities’ (not necessarily of criminal nature) in crowds, groups or pedestrian flows (Mahadevan et al. 2010, Chen et al. 2011, Cong et al. 2012, Marsden et al. 2016, Singh 2018, Ullah et al. 2018) and on detection of unusual behaviour (Tung et al. 2011). These technologies can be described as ‘intelligent surveillance’ where the goal is for the system to alert the human actor only when an event (for example unusual behaviour) is detected in order to minimise the need for human participation in the surveillance process (Tung et al. 2011:230).

What is particularly interesting about these above examples from a criminological and legal perspective is that they seem to attribute a certain intent or emotional attitude to certain behavioural patterns and automate this process seemingly without giving much importance to the context in which they are portrayed, the rights and civil liberties of citizens, and the potential effect of surveillance on people exercising their rights. Not only does such surveillance impact privacy in relation to personal information, the person themselves, their personal behaviour and communications, but as Wright and Raab (2014) note, it may violate the privacy of location, of thoughts and feelings and of association. In relation to the importance of context in which behaviour is conducted, it seems that these recognition technologies have a long way to go in interpreting the actual meaning of behaviour as opposed to merely detecting the observable external cues. Just to name one example, it is unclear whether such tool at this stage would be able to differentiate between a fight that qualifies as a violent crime based on legal regulations and a martial arts training session or people acting out a fight scene.

3.1.2 Crime identification/prediction based on verbal cues

Another potential source of data for real-time crime identification or prediction is the observation of verbal cues in spoken or written form with the intention of attributing a certain emotion, intention or opinion to these cues. When talking about verbal cues, here we can mean not only the words used but also tone of voice and other signs that can be measured as non-verbal cues in other contexts. One such technique, sentiment analysis, is already being used in marketing for instance (Rambocas et al. 2018). Sentiment analysis aims to determine whether people feel positively or negatively about a product or service for example. It derives ‘subjective information from texts in natural language, such as opinions and sentiments’ in order to generate data that can be used for the purposes of decision-making (Pozzi et al. 2017:1). So far, we have seen models and theoretical ideas for the use of sentiment analysis in a policing and crime prevention context, such as the combined use of sentiment data from Twitter and weather data (Chen et al. 2015) or the combination of sentiment analysis with other forms of data scanning and data mining techniques that pull data from social media for crime prevention in crisis and post-crisis situations (Domdouzis et al. 2016). A British example for the use of sentiment analysis in law enforcement is from the early 2010 s, when the technology was deployed to observe trends and associations through social media contentFootnote 14. At this point, however, we have no empirical data on the accuracy and real effectiveness of such systems for crime prediction or crime identification.

A fast-emerging area is the analysis of online content which can be anything from news articles to social media posts. For instance, EMBERS is a forecasting system that is set out to predict societal events based on open-source data such as tweets, Facebook pages, blog posts, meteorological data, economic indicators and satellite imagery. Some events it focuses on are disease outbreaks, civil unrest and domestic political crisesFootnote 15. In case of protests in Brazil in 2013 and in Venezuela in 2014, the system was able to identify indicators of these in social media content and was able to create predictions matching ‘the timing of the events and their trajectory in terms of size and intensity’ (Doyle et al. 2014). This illustrates how these technologies may seem useful for police forces. However, a separate question remains as to how this intelligence would be used in policing practices in line with human rights and civil liberties. This further draws attention to the difference between the potential accuracy and use of predictive analytics and the problems of operationalising them to serve as bases for police action that respects the principles of policing and citizens’ rights that are embedded in liberal democracies.

3.2 Deception detection

Deception detection is a subset of techniques and methods in the field of behaviour analysis. Behaviour analysis interviews—which focus on both verbal and non-verbal cues—are very common in police practice. This technique is ‘believed to be one of the two most commonly taught questioning methods in the US’ (Frank Horvath 2006 in Vrij et al. 2007:501). As for the success-rates of humans in these areas, after analysing the results of 20 expert and non-expert comparisons, Bond Jr. et al. (2006: 229) concluded that while experts seem more sceptical than nonexperts when it comes to assuming a person is telling the truth, on average they only ‘achieve less than 55% lie-truth discrimination accuracy’. In their literature review Strömwall et al. (2004:246) concluded that practitioners or ‘lie experts’ seem to have a similar set of inaccurate beliefs about the non-verbal cues of deception as nonexperts. Police interrogation manuals and police culture serve as the main origins of said beliefs which then are preserved by processes such as ‘cognitive heuristics and biases’ (Strömwall et al. 2004:247). Vrij et al. (2007:510) theorise the possibility that these accuracy rates in experimental studies are partially due to the fact that participants—both non-expert and expert—simply lack ‘knowledge about cues of deception’. This may be true, but we also have to take into account the fact that people with different cultural background and personal characteristics may not share the aforementioned cues of deception or perceptions of deceptive behaviour.

Polygraphs are used in three main areas: screening job-applicants, screening existing employees and in ‘event-specific’ investigationsFootnote 16. Some theorise that the use of deception-detection tools in the field of criminal justice is part of the so-called ‘CSI-effect’ where juries in the US have been observed to be keen on seeing more ‘hard evidence’ from both the prosecution and the defence during criminal trials (Chin and Workewych 2016). Chin’s paper notes that this leads to a lot of ‘unnecessary’ deployment of technologies like DNA analysis in situations where from a legal professional point of view there is enough evidence to establish guilt or the lack of it even without these tests (Chin and Workewych 2016). Theoretically, the results of a lie detection test may be used to determine whether a person involved in the investigation or in the criminal justice procedure is telling the truth or not. However, in 2003 a review suggested that the existing previous research conducted on polygraph technology could not provide reliable evidence to show that ‘polygraph tests could have extremely high accuracy’Footnote 17. Currently, standard polygraphs measure certain physiological traits such as cardiovascular activity, respiratory activity and electrodermal activity (Synnott et al. 2015). As opposed to that, emotional AI promises the possibility to assess truth-telling based on facial expressions (Shen et al. 2021) and vocal tones (Marcolla et al. 2020) instead of, or next to, the traditionally observed functions. Whether that task is possible, we do not have enough information to determine at this point as there is only limited evidence for the existence of a correlation between certain micro-facial expressions and deception (Matsumoto et al. 2018). Therefore, it seems certain that we need more research on whether there are certain signs of deception in facial expressions or vocal tones, whether these are universal or dependent on culture (Rubin 2014; Taylor et al. 2017), and whether they can be observed with enough certainty to give a definitive assessment of truth-telling or deception.

The EU-funded project called iBorderCTRL aimed to create a system that could be applied at the borders of the European Union to ensure faster and more thorough checks for third country nationals at the bordersFootnote 18. One of the tools being developed was the ‘Automatic Deception Detection System’ which set out to quantify ‘the probability of deceit in interviews by analysing interviewees’ non-verbal micro-gestures’Footnote 19. Even though the project was trialled at actual real-life borders (e.g. in Hungary), these tools were never used to make assessments about real people who were crossing the border since this was only a research project and based on project material it does not seem that the development is going any further at this point. The overall accuracy of classifications of experimental runs performed on 18 participants (75.6% in case of truthful statements and 73.7% in case of deceptive statements) of the technology called ‘The Silent Talker’ (O’Shea 2018), based on the very small sample they tested it with, appears to be higher than the average results achieved by expert or non-expert human actors described above (54–55%). However, when looking at each participant’s individual case (O’Shea 2018) we can see that even though the accuracy was 100% in several of the cases, there were quite a few where it was significantly below that. A critical examination of the proposal argues that in the context of deploying emotional AI systems, such as deception detection we need to examine how the use of these technologies create a type of governance that can severely impact opportunities and rights (Sánchez-Monedero and Dencik 2020).

4 Discussion

Just as ‘regular’ predictive policing uses past—and sometimes even current, live—events as input data in order to make risk assessments regarding future crime events (Leese 2021), emotional AI would use certain types of similar data in order to produce output regarding the risk of criminal behaviour. At this point we can theorise that it would use some data for training and some data of which the prediction needs to be made (which then can also become a part of the training data). To imagine a very simplified example, let us say that the aim of the tool is to determine whether a public gathering should be expected to become violent. In this case the AI will learn what are the typical behaviours that precede the ‘violent turn’ and will then look for those in a given public event. This might sound very far-fetched and even dystopian but based on current technological developments we have reviewed above and tendencies in crime control in some jurisdictions around the world, it seems a perfectly logical next step to attempt in the efforts to stop crime. For this reason, the potential issues around the introduction of emotional AI in law enforcement and criminal justice need to be raised.

The use of emotional AI technologies in law enforcement and public security presents us with two separate sets of questions. The first one is about its abilities. What can and cannot emotional AI do to prevent crime? The second line of questions concern the issues and challenges that a debate on the deployment of such technologies would bring. What are these issues when it comes to police use? How can we evaluate whether a given technology is mature enough to use and whether it is necessary, proportionate and is worth the costs? In the final section of the present paper I discuss some key themes that emerge from the previous analysis in relation to the questions above, namely the following: (1) Accuracy and performance; (2) Bias; (3) Accountability; (4) Privacy and other rights and freedoms. Each of these dimensions is worth exploring further, not only in the context of policing and criminal justice applications of emotional AI but also regarding emotional AI’s uses in other fields.

4.1 Accuracy and performance

Regarding (potential) accuracy and performance, there is a need to distinguish between different uses of emotional AI. In case of environments such as classifications in imbalanced datasets accuracy may not be the best performance-metric (Veale et al. 2018) but for now it appears that the relevant research literature focuses primarily on accuracy in this field. If we take the use of emotional AI in a private home (Mano et al. 2016; Mano 2018) or in a smart health environment (Fernandez-Caballero et al. 2016) and compare it to the use in law enforcement, for instance, it becomes apparent that the amount of data available regarding an individual person may be vastly different. In a private home some users actively consented to the use of emotional AI, wish to get the benefits that can come from the use and are willingly working with the technology on fine-tuning the accuracy when it comes to recognising their personal emotions for instance by providing feedback on accuracy. However, even in private homes there are data subjects who may not be aware of the emotion recognition technology that is being used or not have given their consent to their profiling, such as visitors, children or individuals in domestic situations where they are being spied on by a partner, for instance. In case of a law enforcement there most likely will be only small amount of useable data on the individual person in question so the algorithmic decision will be based on whatever training data the system was fed and whatever data it picked up during the course of self-learning. However, that data will be from a pool of individuals that are potentially very different from the person about whom the algorithmic decision is made. For instance, the universalisationFootnote 20 of the classification of the correlation between facial expressions and emotional states is subject to criticism and so is the hypothesis that there is a universal understanding of emotions such as anger or fear as these may be highly dependent on cultural context (Russell 1994). The expression and communication of even the basic emotions may vary between different cultures, situations and ‘people in a single situation’ (Feldman Barrett et al. 2019). Studies have shown that despite existing similarities, different cultures can express certain emotions differently (Elfenbein et al. 2007). Even if basic emotions may be recognised in a cross-cultural setting, there are emotions that may not even be recognised across cultural boundaries (Sauter et al. 2010). It is possible that technology and machine learning will be able to provide an answer to this but a recent study comparing eight commercially available automatic classifier tools for facial affect recognition found that their accuracy varies between 48 and 62% (Dupré 2020). Moreover, when such tools classify emotions, they mainly focus on facial expressions which may not provide sufficient data for identifying emotions as a smile does not always mean happiness (Chen 2019). In fact, the same facial expressions and sets of facial movements may be used to express multiple emotion categories, moreover, may even be used to communicate something else than an emotional state (Feldman Barrett et al. 2019). Humans themselves use other information such as body language, the surrounding environment and personal beliefs and expectation when analysing another person’s emotional state which is mostly perceived in a wider context (Calbi et al. 2017). Furthermore, there is the issue of deception as the individuals in the scope of the criminal justice system or law enforcement may not want to actively disclose their emotions.

An important question in relation to accuracy when it comes to algorithmic decisions is whether they need to be subjected to greater scrutiny than human decisions. In other words, for example, if we know that humans cannot successfully determine deception in stranger-to-stranger interactions (as it was stated above, the success-rate of expert actors is around 55% (Bond Jr. et al. 2006:229)), should we be satisfied with similar accuracy rates in the case of computational methods of deception detection? One argument against this is the scale of potential damage, as algorithmic tools and technologies developed most likely will cover cases on a significantly larger scale than one single human decision-maker, and possibly will be re-deployed in other fields as well once the technology is deemed reliable enough. With algorithmic decision-making systems in general, there is a clear need to distinguish between different tools, technologies and uses as they all vary in accuracy. In this sense, it could be more logical to set a strict but context-dependent minimum standard for accuracy when it comes to the application of emotional AI technologies in the field of policing and criminal justice. The other conclusion we can draw from the diverse range of accuracy rates in AI technologies is that the scope of examination of potential benefits, costs, damages and impacts should go far beyond the study of accuracy alone. We could phrase it the following way: If it is not accurate, it definitely should not be used, but it should not be used only because it is accurate.

4.2 Bias

A matter that closely relates to accuracy is the existence of biases in algorithmic decision-making. It has been long known that human decision-making in law enforcement (Spencer et al. 2016) and also in the criminal justice system (Neitz 2013; Mahoney 2015) is ridden with biases. By now there is considerable amount of research indicating that algorithmic decision-making systems in general can be biased, too (see for example Hannák et al. 2017; Bolukbasi et al. 2016; Eslami et al. 2017; Lorenz et al. 2017; Diaz et al. 2018; Dixon et al. 2018; Ekstrand et al. 2018; Shen et al. 2018, Matsangidou et al. 2019, just to name a few). In the field of criminal justice, for example the biased nature of the COMPAS system was shown (Angwin et al. 2016). The aforementioned studies are not specific to the topic of law enforcement and emotional AI technologies as not many of these have been deployed in real life and the ones which have been have not been researched for biases and discrimination. However, there is research that shows the bias in different emotional AI technologies in other contexts, such as racial bias (Rhue 2018) or differences in the ability to recognise emotions in male and female data subjects (Domnich et al. 2021). The sources of such algorithmic bias can be different, such as (training or input) data bias, human bias (which is brought on by inappropriate system use) and algorithmic processing bias (Shulner Tal 2019). While there is a vast amount of efforts to de-bias systems and mitigate bias (see for example Bolukbasi et al. 2016, Veale et al. 2017, Bellamy et al. 2018, Dixon et al. 2018, Zhang et al. 2018, Zhao et al. 2018, Amini et al. 2019, Shulner Tal et al. 2019, Savani et al. 2020), in the field of law enforcement and in criminal justice strong guarantees and safeguards seem to be necessary showing that a system is not biased against a particular group or against persons with a certain characteristic. Biases in algorithmic decision-making can also reinforce existing societal biases and contribute to the reproduction of existing imbalances in the area of application (Barocas et al. 2016, Noble 2018; Packer et al. 2018), for example by recreating past decisions (Rovatsos et al. 2019:62).

4.3 Accountability

The issue of accountability is always a complex one when discussing algorithmic decision-making systems. It can be defined as the ‘capacity to assign responsibility to the correct agency’Footnote 21 and also be seen as a principle regarding the ‘obligation to justify one’s actions and the risk of sanctions if justifications are inadequate’ (Castelluccia and Le Métayer 2019:III). If there is a human in the loop—such as the acting police officer—then it seems straightforward to suggest that the said human should be responsible for the decision. However, most people who apply algorithmic tools when performing their work duties and also in their everyday lives do not know how the outputs were generated and sometimes do not know of all or even any of the inputs that were used (Smith 2018). If we consider this fact and assume this is true in the case of the acting police officer in our example, it immediately becomes apparent that we are dealing with a difficult question to answer. Moreover, high-stakes algorithmic decisions and decision-support, like usually the ones involving law enforcement, involve various challenges, such as data changes, human actors augmenting outputs, explaining performance (including performance-metrics that differ from accuracy) as well as being characterised by crossed lines of accountability (Veale et al. 2018). It was noted that in a policing context the use of predictive systems can influence the ability of both the individual acting officers and agencies to give an account of decisions that were taken (Bennett Moses et al. 2018). In general, the literature offers several different methods and techniques for increasing accountability (and the related properties of fairness and transparency), such as auditing, explainability management, discrimination discovery and fairness management (Giunchiglia et al. 2019). In the concrete case of predictive policing strict software evaluations have been suggested as full transparency and comprehensibility may not be possible to achieve in this context (Bennett Moses et al. 2018). In any case, it seems that any requirement regarding transparency needs to go further than simply demanding open-source codes and it also needs to consider the apparent trade-off between accuracy and interpretability (Blacklaws 2018). ALGO-CARE offers a decision-making framework for algorithmic assessment tools in law enforcement which suggests a set of questions that should be answered before deployment (Babuta et al. 2018). As regards to accountability, they recommend clarifying whether there are any restrictions that may limit accountability or proper evaluation, whether the algorithm is transparent and accountable and whether it will be placed under review (Babuta et al. 2018). It seems necessary, however, to clarify the requirements, duties based on due diligence and build clear accountability structures into every procedure that involves algorithmic decision-making and decision-support.

4.4 Privacy and other rights and freedoms

The prevalence of rights in law enforcement is a notoriously complex area as this is a field where exercising the full power of the state could result in severe interference with rights and freedoms in the form of restrictions imposed on people during law enforcement procedures (for instance, arrests or searches of property and body) (Ashworth 2012). Among the rights and freedoms impacted can be the freedom of movement, the right to own property, the right to work and freedom of choice in employment, and the direst example, the right to life. On the other hand, in most countries procedures of law enforcement are bound by strict rules and procedural rights of their own which are meant to ensure longstanding legal principles, such as the right to a lawyerFootnote 22 or legal aid,Footnote 23 the right to be presumed innocent and to be present at one’s own trial,Footnote 24 just to name a few. Law enforcement is also a special area of public service considering the nature of duties, tasks and responsibilities in safety and security (Chalom et al. 2001). These characteristics make the fields of law enforcement, criminal justice and other related areas such as border control and counter-terrorism very different from other potential areas of application. Privacy and human rights advocates have drawn attention to the impacts these new, invasive technologies can have on citizens (Wright et al. 2014, Fan 2021). Algorithmic decision-making in this context can affect the right to a fair trial and due process, the freedom of association, the right to effective remedy, the prohibition of discrimination, and the right to privacy, and can have an impact on effective data protection (Council of Europe 2017). Emotional AI technologies can have a significant effect on privacy (such as concerns about ‘self-determination, consent, choice and abuse of personal control’) and the use of such analytic systems raise ethical questions regarding emotional and mental privacy of both individuals and groups (McStay 2019b:6). Moreover, values such as individual autonomy and human dignity can be challenged by the use of automated emotion recognition technologies as well (Valcke et al. 2021). For instance, the way such systems, their data collection methods are designed and what conceptualisations of emotions are used may have a significant effect on whether such systems ‘should be developed and deployed at all’ (Stark and Hoey 2021:790). Aside of rights and freedoms governed by law there are also several ethical principles as well, that characterise the field and are thought to be crucial in connection with, for instance, affective computing, such as the avoidance of deception, the respect of autonomy or the ensuring that the competence of the system in question is understood (Cowie 2015). While assessing the privacy and wider social impact of any new technology is essential, such as by undertaking a ‘Privacy Impact Assessment’ (PIA) or ‘Surveillance Impact Assessment’ (SIA) (Clarke 2009; Wright 2012, Wright and Raab  2012), these need to be undertaken within a wider framework that considers these within the context of other factors including the practical effectiveness of the technology.

5 Conclusion

In this article I have introduced and explained the emerging new technologies that have been described as ‘emotional AI’. This paper has considered how they work and some of the issues relating to their effectiveness in practice as well as the implications of their deployment in public spaces in a policing context. As the drive to make cities ever smarter and to embed machine learning technologies within the urban setting continues to gather momentum, it seems likely that policing and law enforcement agencies will soon begin to explore the potential of such systems to address the problem of urban crime. However, these technologies of ‘emotiveillance’ as McStay (2016:151) puts it, raise particularly troubling questions in relation to privacy, rights and freedoms, especially when used in public spaces.

The paper drew attention to how machine learning, cameras used for the surveillance of public spaces and the analysis of biometrics are already established parts of policing practices. This is significant because these phenomena create an environment into which emotional AI systems can be easily embedded from a technological and policing practices perspective. While in some respects emotional AI represents a continuity with police use of technologies for crime prevention and forensics, in other respects it represents the emergence of a new logic of detecting crime, anticipating criminal behaviour and intervening in crime situations as well as a new form of control over urban spaces and behaviour. Following this I introduced some of the existing use systems and tools that have been deployed, trialled or prototyped in policing contexts, such as crime prevention or crime detection. The analysis showed the probabilistic nature of these tools and how this was in line with Zedner’s (2007) concept of pre-crime in an urban setting. The discussion covered key issues of these technologies regarding accuracy, bias, discrimination, accountability, privacy and human rights. In relation to the urban setting and cities it was noted that amongst the forms of privacy potentially impacted by emotional AI could include the privacy of thoughts and feelings as well as privacy of association.

In conclusion, I argue that even if emotional AI technologies were to become accurate in revealing thoughts, feelings and intentions, their use in a public urban setting for policing purposes should be resisted in democracies because of the technologies’ clash with human rights values and liberties in such societies.