CLASSIFYING HUMANS: THE INDIRECT REVERSE OPERATIVITY OF MACHINE VISION

Classifying is human. Classifying is also what machine vision technologies do. This article analyses the cybernetic loop between human and machine classification by examining artworks that depict instances of bias when machine vision is classifying humans and when humans classify visual datasets for machines. I propose the term ‘indirect reverse operativity’ – a concept built upon Ingrid Hoelzl’s and Remi Marie’s notion of ‘reverse operativity’ – to describe how classifying humans and machine classifiers operate in cybernetic information loops. Indirect reverse operativity is illustrated through two projects I have co-created: the Database of Machine Vision in Art, Games and Narrative and the artwork Suspicious Behavior. Through ‘artistic audits’ of selected artworks, a data analysis of how classification is represented in 500 creative works, and a reflection on my own artistic research in the Suspicious Behavior project, this article confronts and complicates assumptions of when and how bias is introduced into and propagates through machine vision classifiers. By examining cultural conceptions of machine vision bias which exemplify how humans operate machines and how machines operate humans through images, this article contributes fresh perspectives to the emerging field of critical dataset studies.


Introduction
'To classify is human.' 1 Classifying is also what machine vision is trained to do. When Adrian Mackenzie writes about 'machine learners,' he turns machine learning into a concept which refers both to humans and machines. After all, as Mackenzie asserts: 'Machine learners are often simply called "classifiers."' 2 His statement highlights how 'learning' in machine learning often corresponds to training the machine to classify. In machine learning for computer vision, classifiers learn from visual data. And like all data, visual data 'must be classified in some way to be put to use.' 3 In the context of machine vision, data becomes useful when images are collected and classified into datasets. There are many ways to assemble visual datasets. Typically, vast amounts of images or videos are downloaded from the internet. Then classifying humans -data curators and crowdsourced on-demand annotators -translate those images into machine readable data. Classification is thus a core practice of developing artificially intelligent machine vision. 4 Hence, in this article 'classifying humans,' like Mackenzie's 'machine learners,' refers to both human and machine classifiers.
Classifying humans are in the centre of attention as this article investigates cultural conceptions of machine vision bias. Bias is an overloaded term which is defined differently in different disciplines. It comes with multiple meanings even when the context is narrowed to machine learning. In this article bias is understood through an alignment of operations on images. For example, machine vision bias arises when a machine classifier misgenders images of faces. This is often due to a biased or non-representative training dataset underrepresenting the misgendered population. Dataset bias in turn aligns with humans performing bias when classifying and collecting images into datasets. For example, Trevor Paglen and Kate Crawford have demonstrated that ImageNet, which many models are trained on, includes highly problematic classification of images of people. 5 Particularly image recognition technologies deployed to classify humans tend to lead to harmful bias showing prejudice against individuals or groups of people. 6 Acknowledging that different types of biases arise throughout a machine learning models lifecycle, this article is limited to addressing bias in the dataset. Biased datasets can be caused by lack of representation, as when facial recognition algorithms are trained primarily on white men. The biased representation can be a historical artefact. For instance, ImageNet uses WordNet's classification system from the 1990s, which includes stereotypes that would not be acceptable today. 7 We could also imagine how even an unlabelled training dataset from the 1950s would visually portray gender roles as quite different from today. Such dataset bias can cause representational harm such as reinforcing stereotypes. On the other hand, in the near end of this article in my discussion of Suspcious Behavior I address instances in which classifying humans perform bias when annotating visual datasets. This entails both how humans interpretate and are instructed to interpretate images. It involves annotation interfaces which facilitate perception at scale and speed. And it is about annotators learning to 'perceive on the bias' meaning that they embody 'a temporal and rhythmical apparatus' to interpretate images. 8 Artworks and artistic research have been influential in communicating to a wider audience that visual datasets are fundamental for machine perception. 9 Artists have audited, exposed, excavated, and exhaustively watched publicly available visual datasets and brought to attention the lack of diversity in datasets, revealing prejudiced and racist taxonomies and privacy concerns in dataset assembly. 10 Thus, artworks provide a rich context to examine the following questions: Why does dataset bias lead to harmful machine vision? When is bias introduced into machine vision? And why does machine vision bias propagate? I engage with these questions of machine vision bias both as a researcher and an artist through two projects I have been involved in. The first project I will discuss is the 'Database of Machine Vision in Art, Games and Narratives' (henceforth, Machine Vision Database). 11 The second is Suspicious Behavior (2020), an artwork I co-created with Andreas Zingerle (as the artist duo KairUs). 12 I start by analysing four artworks that I propose viewing as 'artistic audits.' I have chosen four 'artistic audits': Max Dovey's performance How to be More or Less Human (2015), Joy Buolamwini's video work AI, Ain't I a Woman? (2018), Nouf Aljowaysir's artwork Salaf (2020), and Nakeema Stefflbauer and Nushin Isabelle Yazdani's artistic video essay Future Tense: AI from the margins (2020). These four artworks illustrate why and how representation and historical bias leads to harmful discrimination when machine vision is deployed. For a broader understanding of cultural conceptions of classifying humans, I turn to the 500 creative works in the Machine Vision Database, and use data visualisations to explore who is classifying and who gets classified in what we call 'Machine Vision Situations.' This data analysis allows me to explore patterns that emerge out of interactions between technologies and other agents implying how bias is conceptualized in a larger corpus of creative works.
Finally, I turn to my second project, Suspicious Behavior, a speculative annotation tutorial, which contributes the fresh perspective of artistic research to the emerging field critical dataset studies. Artworks and artistic research have mainly focused on scrutinizing the content (images, labels, and taxonomies) of visual datasets. Less has been done to understand how and when labels get attached to images. My reflections upon the design of an annotation interface for Suspicious Behavior complicates general assumptions of how and when bias is introduced into datasets. In the sections analysing the four artistic audits and the data from the Machine Vision database, my focus is on bias that arises when machine vision operates on humans through images. Then, in my artistic research, I change perspective and bring to attention humans who perform bias by operating machine classifiers through images.
Both when machines classify images of humans and when humans classify images for datasets, images are not primarily interpretive or affective artifacts, but operational: they are means to an end. Based on Harun Farocki's frequently cited definition of 'operative images' as 'images that do not represent an object, but rather are part of an operation,' 13 images have been understood to hold different degrees of operationality. 14 Images discussed in this article are operationalized as visual data in the context of supervised or semi-supervised machine learning. They thus fit the narrowest definitions of operativity since they function as elements in technical processes. Expanding on Farocki's 'operative image,' Ingrid Hoelzl and Rémi Marie introduce the concept of 'reverse operativity.' Using Google Street View (GSV) as an example, they bring attention to cybernetic information loops in which humans operate upon images which in reverse operate back on humans: In the case of GSV, operation which is not restricted to user navigation, but which is part of a larger circular operation of data exchange, with the users' trajectories feeding back into the database. This 'reverse operativity' reveals the more problematic side of the algorithmic turn: For if we are operating GSV images, they are at the same time operating us. 15 Hoelzl and Marie use the term 'reverse operativity' to describe how images are 'aimed at us.' 16 For instance, in commercial context images are aimed at us when they are used to trace mobility patterns of customers. Each time we operate GVS images, data is collected. Machine classifiers make predictions based on the data, and aim it back at us, e.g. as personalized advertisement. In other words, reverse operativity describes instances in which 'humans operate machines and machines operate humans through images' 17 Hoelzl's and Marie's example of reverse operativity implies that Google's images instantaneously operate directly back on the person operating an image. However, when humans operate on images and make them functional for machine learning, then reverse operativity acts differently. The images are not instantaneous nor directly operating back on the person operating the image. Instead, images operate back on humans in indirect ways. Thus, building upon Hoelzl's and Marie's example of reverse operativity I suggest the term 'indirect reverse operativity.' The term arises from how classifying humans (data curators and annotators) operate on images and those images indirectly operate back on humans when machine vision technologies are deployed to classify us.
The main differences between 'reverse operativity' and 'indirect reverse operativity' is in the distribution, temporality, and the direction of operations. Classifying images for machine learning is a distributed operation involving both human and machine interpretation. I return to this point in the section on Suspicious Behavior. The reversal of an operation -the image aiming back at us -is not instant. There can be long gaps between classifying images for a dataset and the deployment of a machine vision product trained on that dataset. When machine vision is finally deployed, it is not aimed at those who classified the image. Instead, reverse operativity is directed towards humans in general. Due to this indirectness of reverse operativity harmful machine vision bias is often discovered only after machine vision is deployed. Hence, I start by analysing artistic audits which test and evaluate machine vision bias and discuss why machine vision bias leads to harmful discrimination particularly when humans are classified. After pinpointing harms that arise when machines operate through images on humans, I change point of view to reflect upon how classifying humans operate on images.
What emerges when analysing these artworks as instances of indirect reverse operativity is that bias propagates in cybernetic loops. Although this article is focused on a subcategory of AI -namely, machine vision -classification as production of knowledge and a mechanism of knowledge construction is of concern for the whole field of AI. Hence, the concept of indirect reverse operativity could be further expanded to think more generally about the ways we operate on data and how that data operates back on us. By reverse engineering cybernetic loops of machine vision bias this article demonstrates that classifying humans are still central in defining how machines perceive and classify the world. Critical dataset studies aim to 'bring people back in' to our understandings of datasets, as Nanna Bonde Thylstrup has argued it 'requires us to think through how to visibilise humans in machine cultures, but also raises ethical questions about how to encounter these humans with empathy and care.' 18 The concept of indirect reverse operativity enables this, supporting nuanced analysis of datasets that makes human actions visible. The concept is particularly useful as data curation and annotation are increasingly automated, making human actions less obvious.

Artistic audits -when machine vision bias leads to harmful discrimination
Artistic audits, I propose, are artworks which bring attention to the constructed ideological borders that human classifiers have encoded into classification products. Artistic audits demonstrate how certain types of automated bias lead to discriminating machine vision behaviour. Here I outline a few examples of artistic audits to articulate why we need to pay attention to indirect reverse operativity. The chosen artistic audits show how human decisions to classify images play out when they operate back on us. For example, in his performance How to be More or Less Human, Max Dovey plays with how gender stereotypes are encoded into machine vision. 'Max Dovey' 19 is dressed in a suit and is posing in front of a web camera trying to be identified as a man. If 'Dovey' is dressed, the image recognition software installed in the camera classifies him 'correctly' as a man. However, when he starts undressing, an ambiguous threshold is crossed, and his body is classified as a woman. What I want to point out with this example is how a worldview ingrained in a dataset, in other words dataset bias, plays out when a machine learning model is trained on images that indirectly reverse operates when machine vision is deployed to classify humans. 20 In this case the machine vision application is biased to recognize nudity as a female attribute.
Because datasets are situated artifacts there can be many sources for such bias. A visual dataset is a product of a particular time and place. Thus, it matters when and from where images are collected. Historical bias becomes a source of harmful discrimination when a model reinforces gender or racial stereotypes for example in image search results often leading to representational harm. 21 Representation bias, also called sample bias, occurs when types of images occur in a class more often than others. For example, ImageNet is contains a higher female-to-men ratio in classes containing nudity (e.g. bra, bikini, and maillot and pornographic images). 22 The bias towards classifying nudity as a female attribute in How to be More or Less Human could also be caused by similar overrepresentation of females in images containing nudity. Bias can also arise from how data is classified. In the case of automated gender classification, applications typically recognize gender as binary male/female and all other gender expressions are bluntly ignored. 23 Classification requires choices; thus, some features are given more importance and consequently others less when a classifier predicts between chosen classes.
In her artwork AI, Ain't I a Woman? Joy Buolamwini depicts how iconic Black women are repeatedly misgendered by popular gender classification products. In Buolamwini's video we see screenshots of how Google, Amazon, IBM, Microsoft, and Face++ label iconic Black women like Serena Williams, Michelle Obama and Sojourner Truth as 'male,' 'men,' or 'gentlemen' when predicting externalized gendered appearance. Buoloamwini is also a computer scientist, and in an intersectional auditing of gender classification products Buolamwini further demonstrates racial bias in predicting gender. Tested systems performed well on light-skinned men, yet poorly on darker-skinned women. 24 Such bias is caused by historical overrepresentation of white males in facial recognition datasets. Lack of diversity in datasets turns into harmful bias when all other demographics are rendered as deviations from the white male norm when images operate back on us.
Artistic audits do not only depict gender and racial bias in machine vision classifiers. Nouf Aljowaysir's artwork Salaf demonstrates how machine vision classifiers exhibit Americentric and Eurocentric bias. 25 When Aljowaysir was working with photographs from her ancestral archive she realized how image recognition software repeatedly misclassified subjects wearing traditional Saudi and Iraqi clothes. The algorithm attached labels like 'military,' 'army,' or 'soldiers' to the photographs, demonstrating how the Western colonial gaze produces stereotypes about the Arab world by encoding certain bodies as threats. As an artistic audit, Salaf and the other works demonstrate that if we ignore how images indirectly reverse operate on humans, then machine vision, like other technological systems, will continue to reproduce 'the values of white supremacist heteropatriarchy, capitalism, ableism, and settler colonialism.' 26 Nakeema Stefflbauer and Nushin Isabelle Yazdani's artwork Future Tense: AI from the margins articulates how machine vision bias leads to harmful discrimination. In this video work 'Nakeema Stefflbauer' explains why people like she, 'not privileged,' 'not white,' experience AI technologies as intrusive and unfair. 'Stefflbauer's' monologue describes how her racialized, gendered body is 'hypervisible' as there are no 'unoccupied,' 'unmonitored' spaces to escape the gendered and racially biased technologies monitoring every aspect of her life. Yet, at the same time, she is ignored and invisible by design. Numerous examples of harmful bias are taken up in the video: machine vision regularly falling to correctly detect the faces of dark-skinned females, search engines classify images of Black women as apes or gorillas, smart cameras perceiving '"Asian" eyes as closed,' and emotion detection interpretating Black basketball players as angrier than white players. 27 What Stefflbauer describes are instances in which machine vision technologies are employed with the assumption of being more objective or neutral than prior technologies, yet they reproduce existing inequalities. This technological encoding of discrimination is what Ruha Benjamin in Race After Technology calls the 'New Jim Code.' 28 In other words, what Future Tense: AI from the margins lists is one example of 'New Jim Code' after another. The examples of bias revealed in this artistic audit are not isolated cases of poorly designed machine vision classifiers. Machine vision systematically discriminates those in the margins. Stefflbauer and Yazdani's almost eighteen-minute-long video ends with a feminist manifesto written into an editing tool. Referencing the 'matrix of domination,' a concept developed by Patricia Hill Collins in Black Feminist Thought, 29 the manifesto breaks down how the given examples reflect dominant power relations in 'our capitalist, patriarchal, white supremacist society.' 30 In a way, the artwork collects evidence of how machine vision reproduces existing inequalities. It demonstrates that intersections of race, gender, and other identities shape how people experience these technologies. 31 Through 'Stefflbauer's' persona the artwork articulates what is also highlighted in Virginia Eubanks's book Automating Inequality: those oppressed by AI already belong to systematically marginalized populations. 32 These four artistic audits demonstrate how images indirectly operate back on humans with differing accuracy, and how this can lead to experiences of discrimination. Negative attitudes towards biased machine classifiers are not only apparent in these artworks, but also present in a larger sample of creative works including artworks, games and narratives in the Machine Vision Database which I turn to in the next section.

Data analysis -who is classifying and who gets classified in machine vision situations?
The Machine Vision database is a collection of 500 'Creative Works' (190 artworks, 77 games and 233 narratives), and thus presents an opportunity to explore cultural conceptions of machine vision classification in a larger sample of works. I take this opportunity here to examine more broadly who is classifying or being classified in what we call machine vision situations, and what does the interactions between classifying agent reveal about indirect reverse operativity?
However, before getting to the data analysis, I need to make a short detour and explain a few things about the database. As a core member of the 'Machine Vision in Everyday Life' project I was involved in building this database, contributing mainly by collecting and analysing digital artworks and tagging them with interpretative metadata. I want to start by stressing that the corpus of works in the Machine Vision database cannot be claimed as fully representative. In addition, the following data analysis and visualisations are by necessity reductive and cannot express the full range of expression in each work collected into the database. My main unit of analysis is what we call 'Machine Vision Situations' (hereafter referred to in short form as 'Situations'). 33 For a detailed description of our backgrounds, data collection practices, the database architecture, exported datasets, definitions, ethical concerns, and dataset biases see our data paper 'Representations of Machine Vision Technologies in Artworks, Games and Narratives: A Dataset.' 34 The unit 'Situations' was developed to 'capture granular details of what humans, technologies, and other agents are doing in specific interactions with machine vision technologies.' 35 The research team identified and registered instances of interaction with machine vision and entered structured information about agents based on their interpretation of the situation. There can be one or more 'Situation' in each 'Creative Work' in the database. Figure 1 presents a diagram of the database structure and shows how 'Verbs' are used to describe the actions of machine vision technologies and other agents in this 'Situation.' In the following data visualisations, I focus on the verbs 'Classifying' and 'Classified' which are among the most commonly used actions used to describe interactions in 'Situations' ('Classifying' is used 155 times and 'Classified' 95 times) Figure 2 is a network visualisation depicting all agents and actions ('Verbs') in the dataset as nodes in a network, but with the verb 'Classifying' and the agents linking to this action highlighted. Agents and actions that are not related to classifying are shown as grey. In the network visualisation nodes cluster together when they share many links, so if two technologies are close on the network, that means they act in Behavior, Entities like 'Humans in General' or 'Images,' or machine vision 'Technologies.' for further exploration of the database the archived pages can be accessed using QR codes in the diagram. similar ways. As we can see in Figure 2, the cluster of technologies most involved in 'Classifying' includes facial recognition, object recognition, and emotion recognition. The visualisation also shows that the artworks, games and narratives in the dataset depict technologies as 'Classifying' more often than human agents (underlined with pink in Figure 2). Whereas the action 'Classifying' is mainly performed by various machine vision technologies, a network visualisation highlighting the verb 'Classified' (Figure 3) shows a clear difference: 'Classified' is an action usually associated with human agents (underlined with pink). Machine vision technology is usually portrayed as the active party in these artworks, games and narratives, and humans are what are most likely to be classified. However, there is a handful of machine vision situations in the dataset in which humans classify images. The nodes marked with an * in Figure 2 highlight characters depicting what I would call classifying humans, that is, humans who classify. This network analysis shows that although most of the artworks, games and narratives in the dataset primarily show technology as the active party and humans as being classified by the technologies, there are examples of both classifying humans who operate machines and machine classifiers who operate humans through images. Figure 4 takes a closer look at human agents that are classified in machine vision situations. Humans who are classified are generally portrayed as passive in these situations. They are most commonly being 'Analysed' (20), 'Scanned' (8), 'Identified' (8), they are 'Posing' (7), and they are 'Detected' (5). They are being acted upon rather than taking an active role, with the possible exception of 'Posing,' which is done by users/viewers of the artwork when interacting with the work. Other verbs in these situations describe the consequences of being classified and they are overwhelmingly negative (pink dots in Figure 6). Humans who are classified in    (2), Excluded (2), Discriminated (2), or Marginalized (1). Many of these verbs also imply that the classified human finds themselves in an unjust situation. Verbs like Misinterpreted (4), Misgendered (2) and Misidentified (1) indicate that the machine vision is flawed and fails in its task to classify humans. Only Surprised (2), Enjoying (1), Assisted (1) and Encouraged (1) express a positive experience when being classified (green dots in Figure 6). It is notable that in comparison only a few actions express positive experiences suggesting that cultural conceptions of machine vision present a critical stance towards machine vision deployed to classify humans.
To conclude, data from the Machine Vision Database shows that machine vision deployed to classify humans is a reoccurring interaction depicted in art, video games, movies and novels. Notably, being classified by machine vision is overwhelmingly portrayed as a negative experience. This broader analysis confirms the impression from the four artistic audits I discussed earlier. The discussion of the four artistic audits and the data analysis demonstrates how machines operate humans through images in clearly harmful ways. In the next section I change perspective to human bias and delve deeper into how humans operate machine classifiers through images.

Artistic research -suspicious behavior classifying humans in cybernetic loops of indirect reverse operativity
As a fictional annotation tutorial Suspicious Behavior takes a unique point of view by examining visual datasets from the perspective of data annotation. Even though automated perception is shaped by image datasets, they have been given little value in discourses about model building. 36 Likewise, image annotation is given little attention in computer vision, and when annotation practices of image datasets are researched, they mainly focus on how bias gets embedded into datasets through the individual subjective worker. 37 In contrast, Suspicious Behavior ( Figure 5) demonstrates ways in which interpretation is imposed on annotators and thus on datasets. As an example of indirect reverse operativity, Suspicious Behavior demonstrates that the ways outsourced annotators operate on images is actually 'informed by the interests, values, and priorities of other actors.' 38 Artistic research for the project involved investigating the layers of labour that goes into assembling visual datasets. Artists have investigated aspects of outsourced labour often by utilizing platforms like Amazon's Mechanical Turk (AMT) to produce artworks with the aspiration of rendering visible this type of hidden human labour. 39 However, Suspicious Behavior differs from these projects because on-demand labour was not used to create the artwork. Instead, the work of an on-demand annotator is simulated, thus offering the reader an opportunity to experience the conditions under which images are classified and prepared for machine learning. In what follows, I will use examples from Suspicious Behavior to discuss how data curators enforce certain interpretations of images through instructions and interface design.
Data acquisition and annotation can be done in many ways. ImageNet, an influential dataset that has become an integral part of AI infrastructure, established a model practice of scraping images from the internet and outsourcing annotation. This model has been criticized as 'data extraction without consent and labelling by underpaid crowdworkers.' 40 Even though automated annotation as a cost-efficient  option is increasingly popular, such automation still involves an annotation apparatus which at some point of its development depended on what I call classifying humans. Typically, humans involved in the annotation of visual datasets are curators and annotators. Dataset curators are the clients. They usually work at companies or universities and are in a need of a visual dataset to develop machine learning models and deploy machine vision. To sustain the illusion of machine automation, human labour is often intentionally hidden. 41 This is also the case with annotators who typically belong to a low-valued and invisible crowd of outsourced on-demand workers. 42 Annotators are typically hired through platforms like AMT or through companies specialising in annotation services for artificial intelligence. In addition to curators and annotators, project managers, consultants, and quality assurance personnel might be involved in classification.
In Suspicious Behavior, the reader is invited to step into the role of an on-demand annotator-trainee. The annotator-trainee is guided through a fictional tutorial that teaches how to navigate 'Human Intelligence Tasks' (HIT's). HIT's are micro-tasks, for example, tagging a bundle of images or videos and facilitated by often custom made annotation interfaces. In Suspicious Behavior, the annotator's task is to decide if a video contains suspicious behaviour or not. On-demand annotation is casually called 'clickwork,' a term that implies work that is easy and requires little thought. However, assembling, and annotating millions of images or videos is tedious labour usually involving two tasks: labelling and segmenting. Segmenting involves separating the objects that are visible in an image and classifying them as different from each other. Labelling is the act of naming an image or a segment. In Suspicious Behavior, gradually the annotator-trainee learns that the video dataset that is being labelled is assembled in order to train machine vision to spot suspicious human behaviour. This setup exemplifies indirect reverse operativity in which the annotator classifies images, and the images are then used to train machine vision to classify humans.
Milagros Miceli, Martin Schuessler, and Tianling Yang have studied hierarchies and power relations in image annotation, which they define as a 'sensemaking practice' in which meaning is assigned to data by adding labels to images. 43 This 'sense-making practice' is a step in translating images into machine-readable formats. When images become machine-readable, they shift from being visual representations to becoming non-representational processes of calculation. Thus, operative images are perceived as disappearing into black boxes and becoming invisible. 44 In this process of translating the visual into data, labelled images become mediators of meaning. Aud Sissel Hoel suggests considering 'operative images' as interfaces, 'in the epistemological and ontological sense as intermediaries.' If this idea is situated in the machine learning context, then representational images in the datasets can be understood to operate as interfaces.
The images that are annotated function as containers of what Bowker and Leigh Star have called 'boundary objects,' a means of translating human-image-machine interpretation. According to Bowker and Leigh Star, boundary objects have multiple meanings that differ between various social worlds, yet they are recognizable in more than one of those worlds. 45 When an interpretation of an object is not questioned by a community, it becomes a naturalized part of that social world. In a similar way, when images and the objects (or actions) in them are labelled for machine perception, the interpretations of those objects are naturalized, and consequently, alternative interpretations are denied, like in the case of binary gender classification. As those interpretations are built into machine vision, they become naturalized standards in more than one community. An example of this can be observed in Suspicious Behavior's advanced module HIT01:Explorer where the reader can traverse videos and learn more about the UCF-Crime Dataset ( Figure 6). In an article accompanying the UCF-Crime Dataset, data curators acknowledge the difficulty of defining anomalous behaviour, 'since it is quite subjective and can vary largely from person to person.' 46 Nevertheless, they selected 13 anomaly categories they decided have 'significant impact on public safety.' 47 The dataset also contains one category for normal behaviour. Without much contemplation these categories become naturalized, 'disappearing into the realms of what is considered common sense,' 48 even though they impose specific interpretations of anomaly behaviour. Thus, those who classify image datasets hold enormous power, because they make decisions which embed certain ideologies affecting how images indirectly reverse operate upon humans. How is that power distributed among classifying humans?
The annotation interface is yet another layer in the sense-making of images as it facilitates the labelling of images. According to Christian Ulrik Andersen and Søren Bro Pold, layers of '[i]nterfaces organize how data are stored, translated, exchanged, distributed, used, and experienced.' 49 In the process of translating human perception into machine perception annotation can be understood as operating on and through images as interfaces. When investigating the annotation apparatus behind ImageNet, Nicolas Malevé calls to our attention that '[t]he interfaces of annotation are designed to control workers' productivity, to find the optimal trade-off between speed and precision.' 50 In Suspicious Behavior, when the reader chooses to 'become a clickworker' they receive their first HIT. The annotation interface is simple (see Figure 7): videos are presented in a sequence overlayed with the question: 'Spot anything suspicious?' The question is followed by a yes and a no button. The annotator is given 10 seconds to decide, and a countdown of remaining seconds is placed next to the buttons.
The videos, taken from VIRAT Video Dataset, are designed to assess action detection algorithms in video surveillance. In their documentation, the VIRAT dataset curators describe how they prepared hours of video material for annotation by breaking it up into 10 seconds segments. 51 To replicate the experience of annotating short segments of videos, the interface in Suspicious Behavior only allows the annotator-trainee a 10-second glance at each video. The binary yes or no question is an adaption of another annotation interface presented in a paper describing the assembly of the MIT Moments in Time action dataset. 52 Binary labelling is considered the most efficient annotation method to determine if an object or action appears in an image. 53 The reader browsing through the tutorial might ask, what is suspicious behaviour? Instructions for what the annotator-trainee should look for is presented through the   Figure 8). In addition, an instructive YouTube video montage collects examples of what various security authorities define as suspicious: 'anyone quickly leaving an area after dropping a package,' 'abandoning a vehicl;,' 'lurking in corners,' or 'seeming nervous or acting in a disturbing manner' are just a few things the reader is instructed to look for. Written definitions of how to name objects and actions, as well as images of 'good' and 'bad' examples, are embedded in annotation interfaces to assist the annotator to interpretate images correctly. 54 By mimicking the design choices of real-world annotation interfaces, Suspicious Behavior recreates an annotation environment in which a specific reading of the image is imposed on the annotator.
On demand annotators are seldomly provided with additional information about given tasks. They often lack labour rights and struggle to make a living wage. Thus, they need to optimize their work pace and meet quality thresholds set by the client. 55 Dataset curators have developed methods to evaluate worker quality to check that the interpretations of individual annotators are in line with given definitions. 56 Only 'consensus' 57 among several annotators validates that the labelled video contains what the dataset curator is looking for, in this case, suspicious behaviour. This means that several annotators are expected to have the same interpretation of suspicious behaviour for a video to enter the dataset. These are mechanisms dataset curators implement in order to test and filter out those whose performance do not reach a set threshold. 58 In Suspicious Behavior, 'Advanced HIT03: Speed Master' represents such a test. It exemplifies Nicolas Malevé's observation that '[f]or the annotators, structurally, the "glance" is the norm' 59 as the annotator-trainee is given 60 seconds to annotate as many videos as possible. At the end of this test a 'report' (Figure 9) shows how many videos were 'correctly' labelled. If their accuracy rate is higher than 80%, the annotator-trainee qualifies for more work. The annotator-trainee learns a rhythm to maintain a balanced ratio between speed and accuracy while interpreting actions, and this does not allow questioning of how categories are made, or classes defined. 60 Previous examples from Suspicious Behavior expose the hierarchical power dynamics at play among classifying humans. The interface is a technical layer that obscures the fact that dataset curators impose certain interpretations of images on annotators and ultimately manipulate the ways annotators operate on images. The role of annotators is reduced to matching labels with images, thus, the power to define classes and the control over which kind of images those classes contain remains with dataset curators. Therefore, it is not solely the subjectivities of annotators that introduce bias into image datasets. Often the decisions made by curators have a greater effect on how images indirectly reverse operate.
Although dataset curators keep control of image interpretation, there are operations out of their control before annotation even begins. When images are collected into a candidate pool for annotation, search queries on chosen platforms determine what images end up annotated in the first place. For example, describing the construction of the earlier mentioned UCF-Crime Dataset, the dataset curators used variations of search words, like 'car crash' for the category of 'road accident,' in order to find videos for the dataset. 61 The choices of platform and of the interaction mode between users, content creators, advertisers and the platform algorithms are absorbed into the annotation apparatus. Indirect reverse operativity comes into play when platforms or content producers use automated image recognition to manage the labelling of videos. For example, Google Clouds Video AI claims to 'recognize over 20,000 objects, places, and actions in video.' 62 Such automated labelling in turn only operates because a dataset was once classified by another annotation apparatus which now indirectly reverse operates in cybernetic loops. This means that indirect reverse operativity constantly feeds an evolving internetsized 'mass image' with labelled images. In such loops of indirect reverse operativity classifications become naturalized standards and earlier human decisions are taken for granted, embedded in machine vision infrastructure.
New media theorist Jussi Parikka notes how histories of photography and technical images are constructed of '[o]perations built on operations' meaning that instruments, infrastructures, practices, and techniques build upon upon each other. 63 In a similar sense, the operations of an annotator build upon practices and tools such as search engines that are built upon other instruments, practices, and infrastructures. Classification systems embedded in machine vision in turn serve as infrastructure that, for example, city security management systems build upon. Among layers of operations involved in assembling an image dataset, automated interpretation also influences image annotation. The classifying human and the machine classifier merge, and in cybernetic loops of indirect reverse operativity, norms and worldviews embedded in machine vision classification propagate.

Conclusion
In this article I examined cultural conceptions of machine vision bias. I particularly draw attention to machine vision bias in instances when humans operate machines and machines operate humans through images. Figure 10 shows how I understand the artworks I engage with to depict different instances of what I call indirect reverse operativity. This entails that data curators and annotators classify images which operate back on humans when a machine vision classifier trained on these images is deployed to classify humans. What is made explicit in this diagram is that classifying for machine learning is ultimately a human-centred process. Although cost-efficiency and the demand for internet-sized datasets drives the development of automating dataset curation to a constantly evolving 'mass image,' we must also recognise that the 'mass image,, as Sean Cubitt writes, is a 'post-human hybrid of work and labour performed by natural human and technical agencies.' 64 I have used what I call artistic audits as examples to describe how images indirectly reverse operate in ways which are unfair and discriminatory particularly towards marginalized populations. In these artworks machine vision deployed to classify humans is represented as troublesome: it is discriminating, misjudging, and oppressing systematically marginalized populations. This portrayal aligns with scholarly audits of popular algorithmic systems. 65 What the artistic audits by Dovey, Buolamwini and Aljowaysir show is that binary gender classes as well as harmful stereotypes are naturalized as they become part of machine vision infrastructure. Nakeema Stefflbauer & Nushin Yazdan's work explains that machine vision bias is experienced differently at the intersections of race, gender, and class, and it is harmful because it perpetuates systems of discrimination against those already marginalised by society. My data analysis of machine vision situations in a dataset of 500 works of art, games and narratives demonstrated that the cultural conceptions of harmful machine vision bias we saw in the artistic audits are echoed in this larger sample of creative works.
Artworks have been particularly influential in communicating the link between bias in datasets and bias in machine vision to a broader audience. This has pushed the computer vision community to revisit influential datasets and respond with efforts to mitigate harmful bias. Because attention has mainly been directed to representation and historical bias in datasets involving problematic image classes, responses to mitigate bias has involved 'technical fixes' such as balancing the number of images in a dataset to equally represent intersections of gender and race, adding data, or to erase problematic classes. 66 Some methods of diversifying datasets might mitigate bias, while others end up doing the opposite. 67 However, sanitation of datasets in hindsight does not magically fix harmful machine vision bias. Machine vision classifiers already trained on biased datasets still propagate those biases. Increasingly automated data curation and labelling practices create cybernetic loops of indirect reverse operativity in which the ways humans are classified is easily taken for granted even though disproportionally harmful for those in the margins.
Suspicious Behavior presents a different path to confront dataset bias. By acknowledging that datasets are always biased the artwork simulates the experience of performing bias. By reflecting upon artistic research for Suspicious Behaviour I demonstrate how designing annotation interfaces and the instructions they embed is one instance of performing bias. Interfaces are designed to keep dataset curators in control of image interpretation. Even if an annotator's individual subjectivity is often said to introduce bias into datasets, on-demand annotators are given only limited agency and are reduced to following instructions and matching labels to images. Thus, it is important to acknowledge that the technical layer of the interface obscures the fact that dataset curators hold the power to define classes and certain interpretations are imposed on annotators and ultimately affecting on how images indirectly reverse operate. This demonstrates that the power to perform bias, in other words to interpret and classify images, is distributed among annotators and data curators and facilitated by annotation interfaces. This means that bias is introduced into machine vision through a multiplicity of instances.
The point at which bias is introduced into machine vision is further complicated because images do not reverse operate instantaneously nor directly back on those who classify them. When machine classifiers automatically label images which are then scraped, annotated, and fed into a new machine learning model, cybernetic loops of indirect reverse operativity occur. Such cybernetic loops further propagate and naturalize problematic classification. Therefore, for those classifying humansdevelopers, designers, data curators and annotators -who operate on images in the assembly pipeline of visual datasets, it becomes crucial to acknowledge that images indirectly reverse operate, and instances of performing bias should be accounted for.
For more on Americentric and Eurocentric bias in image retrieval, see for example Shankar et al. Miceli, Schuessler, and Yang, "Between Subjectivity and Imposition," 1-2.