Hey, Google, is it what the Holocaust looked like?

: By filtering and ranking information, search engines shape how individuals perceive both the present and past events. However, these information curation mechanisms are prone to malperformance that can misinform their users. In this article, we examine how search malperformance can influence representation of traumatic past by investigating image search outputs of six search engines in relation to the Holocaust in English and Russian. Our findings indicate that besides two common themes commemoration and liberation of camps - there is substantial variation in visual representation of the Holocaust between search engines and languages. We also observe several instances of search malperformance, including content propagating antisemitism and Holocaust denial, misattributed images, and disproportionate visibility of specific Holocaust aspects that might result in its distorted perception by the public. By filtering and ranking information, search engines shape how individuals perceive both present and past events. However, these information curation mechanisms are prone to malperformance that can misinform their users. In this article, we examine how search malperformance can influence the representation of the traumatic past by investigating the image search outputs of six search engines in relation to the Holocaust in English and Russian. Our findings indicate that besides two common themes — commemoration and liberation of camps — there is substantial variation in the visual representation of the Holocaust between search engines and languages. We also observed several instances of search malperformance, including content propagating antisemitism and Holocaust denial, misattributed images, and disproportionate visibility of specific Holocaust aspects that might result in distorted perception by the public.


Introduction
Web search engines, such as Google or Bing, can shape social reality through algorithmic content curation. By filtering and ranking information, search engines influence how their The choice of case study is attributed to the Holocaust's unique status as a global memory event (Levy and Sznaider, 2006). Because it is an epitomic case of mass atrocities, the Holocaust often serves as a point of reference for other crimes against humanity (Stevick and Gross, 2014). The Holocaust also features prominently in the public debate on topics varying from the rise of populism to memory ethics (Subotić, 2020), which additionally stresses the importance of its representation by search engines.
To answer this question, we audit image search results for the "Holocaust" query in six search engines. Using research on the human curation of Holocaust images and Holocaust representation online, we identify possible forms of search malperformance and check for them in search outputs. To identify whether search malperformance varies depending on the language in which the search is conducted, we compare the results obtained for Russian and English language queries.

Human curation and visual representation of the Holocaust
The Holocaust is one of the most well-documented cases of mass atrocities, which, to a large extent, is attributed to visual content produced during and after the Second World War (Kushner, 2006). Images, such as photos from concentration camps served as crucial evidence during the postwar tribunals investigating Nazi crimes (Bathrick, 2008). Today, these images help to communicate the traumatic experience of the Holocaust to new generations and preserve memories of its victims (Hirsch, 2001).
Despite its importance, the use of visual content for remembering the Holocaust is also controversial. A large number of Holocaust images prompt the need for their selection by human curators (Kushner, 2006). However, such selection is a complicated task in which both educational (e.g., what image is more suitable for informing the audience) and ethical (e.g., does the suffering of Auschwitz or Majdanek victims deserve more spotlight) perspectives need to be considered (Holtschneider, 2007).
These ethical complexities prompt many scholars and survivors to argue against the idea of visually representing the Holocaust (Crane, 2008). The core of this argument is the claim that unprecedented violence during the Holocaust cannot be expressed adequately; hence, silence is a "more accurate or truthful or morally responsive" (Lang, 2000) way of treating it.
Practically, it resulted in several calls to avoid representing the Holocaust by visual means, or at least limit access to images associated with it (Crane, 2008). Neither of these options, however, looks feasible today, considering the wide distribution of Holocaust-related content across online platforms (Gibson and Jones, 2012;Makhortykh, 2019Makhortykh, , 2017Menyhért, 2017).
The infeasibility of not visualizing the Holocaust indicates the need to design ways of doing it ethically, which is no trivial task. Even though regional discrepancies in Holocaust representation (Young, 1993) became less pronounced following its reimagination as a global event (Levy and Sznaider, 2006), multiple context-agnostic considerations for its human curation persist.
These considerations relate to different aspects of Holocaust representation. Many Holocaust visuals are hard to digest, as visitors are confronted with graphic images of dead bodies (Holtschneider, 2007). Instead of evoking compassion, it can shock individuals and make them object to Holocaust representation (Carden-Coyne, 2011). To counter this, museums use prewar images of victims to build an empathic connection between them and visitors (Holtschneider, 2007), though the focus on such images can also obscure more horrendous aspects of the Holocaust.
The comprehensiveness of representation is another consideration that must be accounted for. While victims are usually situated at the center of representation, bystanders and perpetrators also played a role in the Holocaust. The exclusion of the latter two groups can decontextualize the event's representation, but their inclusion can undermine the centrality of victims' suffering (Carden-Coyne, 2011).
Similarly difficult is the process of deciding whose suffering to prioritize. The Holocaust took place at more than 44,000 incarceration sites (U.S. Holocaust Memorial Museum [USHMM], n.d.), but a few of them, in particular Auschwitz, are more well known due to being featured in popular culture products (Mintz, 2001) and serving as tourist destinations (Biran, et al., 2011). However, it creates a dilemma for human curators: to focus on recognizable images, which might be more appealing for the public, or to highlight less-known killing sites.
These considerations are far from the only challenges of human curation of Holocaust images. The other challenges vary from mislabeled images (Zelizer, 1999), the use of which can undermine the authenticity of Holocaust representation, to the dubious origins of many visuals produced by perpetrators against the will of their victims (Hirsch, 2001), thus prompting ethical concerns about their use.
These ethical concerns are amplified by the digitization of Holocaust images and the use of online platforms for their dissemination (Makhortykh, 2019). The opinions on the effects of such change vary from offering new venues for dealing with Holocaust trauma (Gibson and Jones, 2012) to facilitating Holocaust denial and trivialization (Gray, 2014).
Importantly, the platformization of Holocaust representation also diminishes the role of heritage institutions as gatekeepers of historical content. Instead of human curators who determine how the Holocaust is represented by museum exhibitions, the selection and ranking of Holocaust images online are performed by algorithms. Algorithmic curation of Holocaustrelated content also raises its own concerns, as we discuss below.

What can go wrong with the algorithmic curation of Holocaust information?
Due to the lack of research on the algorithmic curation of Holocaust images, we synthesized our own list of indicators of systematic and non-systematic search malperformance in relation to this. To do so, we combined insights from research on the challenges of human curation of Holocaust images and on the representation of the Holocaust online. The resulting list consists of five indicators of search malperformance in the context of the Holocaust: 1) misattribution; 2) overrepresentation; 3) trivialization; 4) revisionism; and 5) antisemitism.
Misattribution occurs when visual content is attributed to a historical phenomenon to which it is actually unrelated. In the case of the Holocaust, misattribution is a common concern of human curation due to the large amount of visual material that often lacks reliable attribution (Zelizer, 1999). The use of misattributed content can mislead the public and cause incorrect interpretations of the event to which the image is mistakenly connected.
Misattribution is even more common for online representation of the Holocaust because Web users often lack the professional training required to correctly identify the origins of a particular image. It also facilitates the use of misattributed images to instrumentalize Holocaust memory (e.g., to stigmatize political opponents; Makhortykh, 2019). Depending on the sources from which search engines retrieve images, misattribution can be more or less present, but we expect at least some misattributed images to appear in the search outputs.
Overrepresentation is related to the disproportionate visibility of certain aspects of a historical phenomenon, which can lead to its skewed perception. Similar to misattribution, it often occurs in the case of human curation of Holocaust-related content (Ebbrecht, 2010;Hansen-Glucklich, 2014). Some examples include the frequent focus on Holocaust perpetrators that leads to structuring the event's representation around their "chronology and ideology" [2], as well as images showing the liberation of camps (Ebbrecht, 2010).
The systematic prevalence of certain aspects of the Holocaust in human curation can lead to their overrepresentation in search outputs. While no study has investigated this memory spillover effect for image searches, Zavadski and Toepfl (2019) found that history-related text search results tend to reproduce dominant narratives. Hence, there is a possibility that search outputs can systematically prioritize some aspects of the Holocaust while downgrading others.
Trivialization concerns the use of the Holocaust for amusement and public distraction purposes (Doneson, 1996). It often involves simplification of the Holocaust's complex nature to make it more accessible or to downgrade its gruesome aspects and provide a more entertaining experience. Examples of such uses vary from Holocaust tourism (Cole, 1999) to Holocaust-themed exploitation movies (Kerner, 2011).
A common form of online trivialization is the use of Holocaust references for producing entertaining content, such as Internet memes (Makhortykh, 2015;Sanchez, 2020). While such content can be viewed as less offensive than revisionist and antisemitic claims, it diminishes the importance of the Holocaust by normalizing it and humanizing perpetrators (Rosenfeld, 2014). Considering the effort that search engines put into countering such "junk" (Bradshaw, 2019) content, we do not expect trivializing images to appear in response to history-related queries, but it must be verified if this is the case.
Revisionism (also known as Holocaust denialism) rejects evidence of the Holocaust and challenges established views on the event. Revisionists often make claims that "carry a degree of absurdity" [3] in their interpretation of the past and attack the dignity of the Holocaust victims. Such claims vary from downgrading its scale to complete rejection of the fact that the Holocaust happened (Lang, 2010).
The rise of digital media facilitates the distribution of revisionist views due to the ease of promoting revisionist content online and the focus of anti-revisionist legislation on analog media (Whine, 2008). Search engines, particularly Google, prioritized revisionist Web sites for some Holocaust queries, but after these cases were exposed, they were demoted (Sullivan, 2016). Hence, while we expect that revisionist content should not be prioritized for a general "Holocaust" query, we cannot fully exclude the possibility of it appearing there.
Antisemitism in the form of different expressions of hostility toward Jews is another indicator of search malperformance. Because the Holocaust is an important component of Jewish identity, it is also targeted by antisemitic content. Unlike revisionist content that focuses on denouncing the notion of the Holocaust, antisemitic content attacks Jews in the context of the Holocaust by justifying it and calling for its continuation.
Despite attempts to fight antisemitism online, digital media remains "a safe harbor" (Ozalp, et al., 2020) for antisemitic campaigns. While search engines try to avoid prioritizing offensive content, they sometimes promote antisemitic content (Noble, 2018). Such cases are usually explained by data voids, namely, limited, or non-existing data for specific search terms (Nguyen, 2020). Following this logic, antisemitic content should not appear for broad queries, such as "Holocaust," but whether that is actually the case has to be checked.

Related work: Algorithmic auditing of search engine malperformance
Algorithmic auditing is a research methodology that scrutinizes "the functionality and impact of decision-making algorithms" (Mittelstadt, 2016). Functionality auditing examines how algorithms arrive at certain decisions, whereas impact auditing investigates their outputs. Of the two approaches, impact auditing is commonly used to detect malperformance in Web search results (Trevisan, et al., 2018).
There are three approaches used for impact auditing in the context of Web search. The first relies on querying search engines either manually (Noble, 2018;Sullivan, 2016) or via respective APIs (Kay, et al., 2015;Otterbacher, et al., 2017). This approach is effective for detecting search malperformance and does not require complex technical implementation, but its results might be affected by search personalization and randomization. Noble (2018) used the querying approach to detect the systematic overrepresentation of derogatory content in relation to non-white and female groups. It was also used for studying curation of historical content in a text search, which identified high visibility of denialist content for Holocaust queries (Sullivan, 2016) and similarities between Google and Yandex in reliance on authoritative sources for history-related queries (Zavadski and Toepfl, 2019).
The second auditing approach uses search results collected from crowd-workers. Participants can be recruited using crowd-working platforms, such as MTurk, but scaling it to many workers can be costly. This approach is also not suitable for certain auditing problems (e.g., studying the effect of specific variables on search outputs; Hannak, et al., 2013) because results can be affected by search personalization, which is difficult to control for.
The crowd-worker approach is useful for auditing Web search bias and ideological segregation because it allows for the comparison of outputs from workers with different ideologies. It was used to disprove the polarizing effect of Google's Web search following the 2016 U.S. elections (Robertson, et al., 2018). It also revealed discrepancies in the representation of German parties in Google searches around the 2017 German federal elections (Puschmann, 2019).
The third approach employs virtual agents, that is, software emulating user behavior to collect search outputs. It can involve modeling agent personas (Feuz, et al., 2011) or Web search accounts (Hannak, et al., 2013) or focus on non-personalized outputs for text  or video search (Urman, et al., 2021a). Agent-based approaches can be difficult to implement technically, but they allow controlling for both search personalization and randomization by auditing in a controlled environment.
Agent-based approaches are used for a broad range of auditing tasks. Feuz, et al. (2011) simulated the browsing behavior of different information-seeking personas to show that the effects of search personalization for text outputs increase over time. Another study found evidence of limited personalization of political search outputs on Google and the prevalence of mainstream information sources (e.g., news media and political parties) (Unkel and Haim, 2021).
Despite the growing use of auditing for detecting Web search malperformance, there are still several aspects of it which remain understudied. Only a few studies (Kay, et al., 2015;Otterbacher, et al., 2017) have conducted audits of image search outputs. The results of these studies indicate systematic search malperformance that leads to the reiteration of societal stereotypes (e.g., in relation to gender and race). It stresses the need to extend research on image search and its malperformance to other areas, such as historical content.
Another aspect of search malperformance that is currently underinvestigated is its comparative dimension. Most auditing studies focus on a single search engine, namely Google, which is the most popular search engine globally (Statcounter, 2020a). However, other search engines are still used by millions of users and play an important role in local markets, such as Baidu for China and Yandex for Russia (Statcounter, 2020b).
Some comparative studies (Jiang, 2014;Urman, et al., 2021b) found major discrepancies in how different search engines represented and misrepresented specific phenomena. This indicates the need for more comparative analyses that can determine whether some search engines are more prone to malperformance and detect how different curation models affect the representation of specific subjects, including historical information.

Data collection
To collect data, we used software simulating user browsing behavior (e.g., scrolling pages and entering queries) and recording its outputs (for more information on method see . This virtual agent-based auditing approach allows controlling for personalization (Hannak, et al., 2013) and randomization Urman, et al., 2021b) factors that can influence search outputs.
Unlike human agents, virtual agents can easily be synchronized to isolate the effects of the time at which the search is conducted. They can also be deployed in a controlled environment, such as a network of virtual machines in the same IP range using the same operating system and browsing software, to limit the effects of personalization that might lead to skewed outputs.
In addition to controlling for personalization, agent-based auditing allows addressing the randomization of a Web search that is caused by search engines testing different ways of ranking results (Battelle, 2011). Search randomization can lead to different outputs for identical queries entered under the same conditions, thus making the observations non-robust. One way of addressing this is to deploy multiple virtual agents that simultaneously enter the same search query to determine randomization-caused variation in the sets of outputs that can then be merged into a single, more complete set.
For the current study, we built a network of 100 CentOS virtual machines in the Frankfurt region of the Amazon Elastic Compute Cloud. On each machine, we deployed two virtual agents: one in the Chrome browser and one in Mozilla Firefox. Each agent consisted of two browser extensions: a tracker and a bot.
The tracker collected the HTML and metadata of all pages visited in the browser and sent them to a storage server. The bot emulated a sequence of browsing actions that consisted of (1) visiting an image search engine page; (2) entering the "Holocaust" query; (3) scrolling down the search result page to load at least 50 images; and, (4) cleaning data accessible by the browser (browsing history and cache) and the search engine's JavaScript (local storage, session storage, and cookies) to prevent earlier searches affecting the subsequent ones.
The length of the simulated browsing session was kept under three minutes for all search engines. The next browsing session always started seven minutes after the beginning of the previous session to guarantee that the agents would always be synchronized. Before starting the experiment, the browsers were cleaned to prevent the search history from affecting the search outputs .
The study was conducted on 27 February 2020. We distributed 200 agents among the six most popular search engines by market share: Google, Bing, Yahoo, Baidu, Yandex, and DuckDuckGo (Statcounter, 2020a). For all engines, the ".com" version of the engine was used.
The agents were equally distributed between the engines, but because of technical issues (e.g., bot detection mechanisms), some agents did not manage to complete their routine. The overall number of agents per engine that completed the full simulation routine and returned the search results was as follows: Baidu (31), Bing (31), DuckDuckGo (34), Google (33), Yahoo (31), and Yandex (15).

Data analysis
For our analysis, we extracted URLs of image search results for each agent and detected the 50 most frequent search results per search engine. These images were manually examined by one of the authors, who is a trained historian with experience of working with Holocaustrelated archival and digital content. The purpose of this examination was to detect the location and time at which the image was produced, so it would be possible to identify whether it was related to the Holocaust (and not another historical event), and facilitate the identification of which aspect of the Holocaust the image showed.
During the examination, the historian retrieved all the images from their respective URLs and viewed them one by one in the browser. Many of the reviewed Holocaust-related images were rather well known, and thus were easy to connect to a specific location or event (e.g., the Warsaw Ghetto boy photo; Zelizer, 2015). However, in order to minimize the probability of an error, each image was searched for in the online collections of the U.S. Holocaust Memorial Museum (USHMM) to verify image attribution.
The search was facilitated by the historian's existing knowledge of Holocaust materials that allowed them to narrow down the process of searching to specific locations/episodes and then verify the initial attribution. In these cases, when no direct match was found (or the image was obviously not related to the Holocaust, as in the case of the exploitation movie posters), the reverse search in Google and Yandex was used to locate the source of the image.
As part of this initial examination, URLs that were no longer accessible (e.g., because of a change in address) were dropped. A few images for which it was not possible to identify reliably whether they were related to the Holocaust, were ignored. The result was the set of images described in Table 1 and used for the rest of the analysis. Following the examination, the resulting images were examined by two coders to detect possible instances of search malperformance. Each image was classified according to the following categories: 1) Revisionism: does the image deny that the Holocaust has happened or downplay its scale? To measure intercoder reliability, 30 percent of the sample were coded by both coders. Based on this, we calculated the Kripperndorf's alpha values for each of the categories above, which showed a high level of reliability: 0.85 (revisionism), 0.85 (trivialization), 0.93 (antisemitism), 0.89 (misattribution), and 0.84 (Holocaust theme). Following the reliability assessment, the identified disagreements were resolved using consensus coding. Figures 1 and 2 show that algorithmic curation of Holocaust content prioritizes images related to two Holocaust themes. These two themes are represented by images depicting the liberation of the camps by the Allies at the end of the war (liberation) and the postwar commemoration of the Holocaust (memory). The latter theme is represented primarily by the images of Holocaust museums and memorials, in particular the Auschwitz-Birkenau Museum (Oświęcim) and Memorial to the Murdered Jews of Europe (Berlin).  The visibility of several other themes varied depending on the query language. For English, search engines prioritized images showing Jews being arrested (arrest) and deported to concentration and death camps (deportation). For Russian, images of murdered camp inmates (post-murder) were more visible, particularly when compared with the outputs for the English queries, where their presence was marginal.

Findings: Visual representation of the Holocaust via algorithmic curation
The language-based difference was particularly pronounced in the case of the Chinese search engine Baidu, where for the Russian query, there were no outputs related to the Holocaust. Instead, there were only misattributed images that could be explained by Baidu's search algorithms being prone to malperformance in the case of Russophone requests.
In addition to the query language, the choice of search engine also influenced Holocaust representation. For instance, images of Jews being tortured (torture) appeared only on smaller Western engines (Bing, DuckDuckGo, and Yahoo). The same engines, together with Google and Baidu, showcased images of actual murder (murder), whereas such content did not appear on Yandex.
Several themes appeared only on one or two engines, thus stressing cross-engine differences in historical content curation. Images of prewar Jewish life (prewar), Holocaust survivors seeking revenge (retribution), life during Nazi occupation outside ghettos (occupation), and Holocaust perpetrators (perpetrators) appeared only on Google. Similarly, only Baidu and Yandex showed images of Jewish children being rescued before the war (rescue).
In terms of geography, the majority of images came from Holocaust sites located in contemporary territories of Germany and Poland (Figures 3 and 4). Their prevalence across all engines and languages can be explained by major Holocaust camps-turned-museums being located there (Auschwitz, Treblinka, and Belzec for Poland and Dachau, Bergen-Belsen, and Buchenwald for Germany). The only exception was Baidu, where the second largest number of outputs was related to commemorative sites in the U.S. (e.g., U.S. Holocaust Memorial Museum).

Figure 3:
The proportion of outputs related to Holocaust sites by country (per engine; English query).

Figure 4:
The proportion of outputs related to Holocaust sites by country (per engine; Russian query).
The visibility of sites in other countries varied depending on the query language. For the one in English, there were more outputs related to Austria (Mauthausen), Hungary (Budapest), and Israel (Yad Vashem). There were relatively few outputs related to post-Soviet countries, such as the Ukraine, Belarus, or Russia, despite major acts of extermination happening there (in particular, in the Ukraine, where more than 1.5 million Jews were murdered as part of the so-called "Holocaust by bullets," namely the acts of extermination carried outside specialized areas; Desbois, 2008).
A reverse situation was observed for outputs in response to the Russian query, where images coming from post-Soviet countries gained prominence, and outputs associated with Central European countries (Austria and Hungary) were less visible. This discrepancy can be explained by memory spillover, particularly Russophone memory institutions focusing on Nazi crimes committed in Eastern Europe and providing more visual content for them.
In terms of specific Holocaust camps, we found that, independent of language, images related to Auschwitz were prioritized (Figures 5 and 6). A few other camps that frequently appeared in outputs were Bergen-Belsen (Germany), Buchenwald (Germany), Ebensee (Austria), and Dachau (Germany). The exact proportion of images related to the respective camps varied by engine (e.g., DuckDuckGo and Yahoo prioritized content from Bergen-Belsen, Google and Yandex from Buchenwald).  The visibility of camps other than Auschwitz was also affected by the language of the query. Images from Ravensbrück, Sachsenhausen, and Nordhausen (all from Germany) appeared only for Russian queries. However, with the exception of Auschwitz, images of concentration camps from Western Europe, where inmates were incarcerated, prevailed over images of extermination camps from Eastern Europe, where inmates were murdered.
Such a distribution can be explained by content availability (e.g., Western European camps being liberated by Allies, who produced more visual content for Western Web sites) and fewer graphic images coming from concentration camps, particularly considering that extermination camp inmates were usually murdered before their liberation. However, it also resulted in the omittance of major centers of extermination in the east, which either appeared in search outputs only a few times (Majdanek, Treblinka) or never appeared at all (Sobibor, Chemno).

Revisionism, trivialization, and antisemitism
Our analysis shows that some search outputs promoted revisionism, trivialization, and antisemitism. These outputs were distributed unequally between the search engines, and their presence was higher for two Western engines: DuckDuckGo and Yahoo. In both cases, these forms of malperformance were more present in response to the Russian query. This indicates that algorithmic curation was more prone to errors in non-English content. Figure 7 shows that antisemitic content appeared in nine percent and eight percent of the search outputs for DuckDuckGo and Yahoo, respectively. It usually consisted of images attacking Jews in the context of the Holocaust (e.g., by claiming that it was caused by the need for Christians to protect themselves against Jews) or reproducing antisemitic tropes (e.g., by showing caricaturist images of Jewish puppeteers). The presence of such images aligns with recent observations of the presence of antisemitic content in image search outputs (Nguyen, 2020). However, unlike other cases, where it was explained by data voids, namely, the absence of proper results for rare queries, the "Holocaust" query is not a niche one. This observation questions the data void argument and suggests that the retrieval of antisemitic content can be caused by search algorithms and not the absence of relevant content.
A few outputs (Figure 8) also promoted Holocaust trivialization, usually in the form of Internet memes, such as the one showing a Russian-Jewish TV anchor, Igor Kvasha, with a caption that said "Burn me" (a reference to the TV program called "Wait for me" hosted by Kvasha). Another common example of trivialization dealt with images of people behaving improperly at Holocaust sites (e.g., sitting at the Memorial to the Murdered Jews of Europe in Berlin). While generally less obtrusive than antisemitic images, trivialization content can still be treated as an attack against the dignity of Holocaust victims. It can also promote inappropriate behavior at Holocaust sites, and in some cases, like the one with the Kvasha meme, gets close to antisemitism. The appearance of such content in the outputs for the "Holocaust" query can be attributed to the malperformance of filtering mechanisms.
Compared with antisemitism and trivialization, revisionism was represented by only a few outputs (Figure 9). Its low visibility can be attributed to both the difficulty of expressing revisionist arguments via visual means only and the solid performance of filtering mechanisms that prevent such outputs from appearing in response to Holocaust-related queries. Revisionist content varied among the search engines. For Google, it was represented by an image referring to the Roger Hallam scandal, during which the Extinction Rebellion founder claimed that the Holocaust was "almost a normal event" (Connolly and Taylor, 2019). While not propagating revisionism per se, its appearance in Holocaust-related search outputs can amplify the visibility of revisionist arguments.
In contrast, Russian queries for Yahoo and DuckDuckGo contained images explicitly promoting revisionist views. Usually, these are memes claiming that Jews benefit from the Holocaust and persecute people who try to tell the "truth" about the event. An example of such a meme is an image of Soviet war veterans being humiliated and forced to enter the Holocaust museum. Similar to antisemitic content, the prioritization of such images for the "Holocaust" query can hardly be attributed to data voids and can be explained by filtering malperformance. Figure 10 shows that many search outputs were unrelated to the Holocaust as a historical event. The degree of misattribution varied across search engines and was higher for the images retrieved for Russian queries. The only exception was Yandex, which can be explained by its algorithms being more likely to be trained on Russophone data. However, considering that Yandex outputs contained the lowest number of misattributed images, the observed difference could also be related to the better design of the image retrieval algorithm.

Figure 10:
The proportion of outputs prone to misattribution (per engine).
The importance of query language for this type of malperformance is reflected in the different forms it took between English and Russian queries. For English, misattributed content (except Baidu) had little difference from authentic Holocaust images and was often difficult to detect for a non-trained historian. One example is the image of starving kids behind the wire, which looks similar to images from liberated Nazi camps but is actually a photo made in a Finnish prisoner of war camp for the Russian population in Karelia.
A few other misattributed examples for the English query included black-and-white photos showing suppression of the Mau Mau rebellion in 1950 and the expulsion of Rohingya from Myanmar in 2017. While these outputs also deal with mass atrocities, their retrieval for the "Holocaust" query can result in them being erroneously perceived as either authentic representations of the Holocaust or events similar to the Holocaust in nature or scale, which can be misleading.
In contrast, misattributed images retrieved for the Russian query, particularly on DuckDuckGo, were easy to differentiate from authentic Holocaust content. Most of these outputs had little to do with historical images and showed content that was remotely related to the Second World War (e.g., images of toy soldiers) or not related to history (e.g., images of dead pigs or caricaturist images of Jews).
The Chinese search engine Baidu is a special case in terms of misattribution, with 36 percent of its outputs for the English query and 100 percent for the Russian query being irrelevant to the Holocaust. In the case of the English query, search outputs were mostly made of entertainment content, including posters of exploitation movies and death metal groups with the word "Holocaust" in their names.
For the Russian query, all the search outputs constituted images dealing with the tourist industry (e.g., photos of guest houses and hotel rooms), but were not related to the Holocaust. The lack of relevant outputs is most likely due to Baidu's poor performance for Russianlanguage queries, which made it impossible to retrieve any Holocaust-related content in Russian via this engine.

Overrepresentation
Compared with other forms of search malperformance, overrepresentation is harder to define. Unlike cases with a clear baseline against which the distribution of specific features in outputs can be compared (e.g., in the case of gender distribution for occupations in the U.S.; Kay, et al., 2015), for historical content, including the Holocaust, there are no clear guidelines determining the desired proportion of certain features in the outputs.
At the same time, the relative prevalence of outputs related to specific Holocaust themes and sites can be treated as an indicator of overrepresentation by itself. Such unequal retrievability (Traub, et al., 2016) of information associated with certain aspects of the Holocaust creates a skewed perception of the phenomenon, where some aspects are highlighted and others downgraded.
In the case of Holocaust themes, there was a profound imbalance between one or two top themes, which constituted 50-60 percent of all retrieved content per engine, and the other themes. This imbalance leads to a skewed representation of the Holocaust, where the focus is on its final stage (i.e., the liberation of camps) and the aftermath. It results to a situation where the user is exposed to a rather simplified narrative of Jews being deported, liberated, and then commemorated, while omitting other aspects of the Holocaust.
While some of these other aspects (e.g., the consequences of mass murder for the Russian query on Bing) occasionally appeared, a number of Holocaust themes were consistently underrepresented. These themes included not only torture and murder, but also life in ghettos and Jewish resistance. Similarly downplayed were matters of prewar life, which are important for contextualizing the Holocaust (Holtschneider, 2007), as well as the postwar life of survivors.
It is hard to determine the ideal proportion of images showing different aspects of the Holocaust. However, there was a clear imbalance in the representation of the phenomenon when, for instance, on DuckDuckGo (Figure 1), 40 percent of outputs showed liberation of camps and only four percent showed ghettos. Such inequalities can be viewed as unfair, considering that both aspects are important for understanding and remembering the Holocaust.
Unequal retrievability was also observed for content from specific countries and sites. Images from Poland and Germany constituted more than 70 percent of Western engine outputs. Such distribution reflects the fact that many camps were located in what is now the territory of these two countries, but it also omits ghettos in the post-Soviet states and extermination sites used for the "Holocaust by bullets." Similarly, transit camps used to move Jews from Western Europe (e.g., Westerbork) remained underrepresented.
Overrepresentation was even higher when comparing images related to individual Holocaust camps. The largest proportion of images (in some cases, up to 80 percent) was associated with a single camp: Auschwitz. Other camps, particularly the ones in Eastern Europe, remained mostly ignored. Some of them (e.g., Majdanek and Treblinka) appeared occasionally, whereas others (e.g., Sobibor and Chelmno) were absent in search outputs.
The focus on just a few Holocaust sites resulted in major episodes of the Holocaust being omitted by the algorithmic curation. It has created a situation where the visibility of victims' suffering and, to a certain degree, its recognition by the public is very unequal. By giving priority to images of Auschwitz, search engines highlight the tragedy of more than a million of its victims, but obscure deaths of those who perished during the "Holocaust by bullets" (1.5 million victims) or, for instance, in Sobibor (more than 150 thousand victims).
The rationale behind this overrepresentation is also concerning. Together with the high visibility of Holocaust museums and memorials (i.e., memory themes), the disproportionate retrievability of content from Auschwitz, which is a major tourist destination, could indicate that search engines prioritized images based on the commercial value of the respective sites. This commodification-based curation logic can be unethical, especially in the case of content dealing with crimes against humanity and prompts the need for further investigation of overrepresentation malperformance in relation to other historical events.

Discussion and conclusion
Our analysis indicated that the visual representation of the Holocaust varied depending on the search engine and the language of the query. Western search engines prioritized images showing deportation of Jews and liberation of the camps in English but shifted toward more graphic content in Russian. In contrast, non-Western engines focused on images of Holocaust memorials and, in the case of Baidu, content that was unrelated to the Holocaust.
The differences in algorithmic curation are not surprising per se, considering that search engines rely on different algorithms and databases that might result in different ontologies, namely, hierarchically structured sets of items used to characterize a specific phenomenon (Ramkumar and Poorna, 2014). However, fundamentally different interpretations of the Holocaust in different search engines call into question the status of the Holocaust as a global memory event (Levy and Sznaider, 2006). These differences also raise concerns about the ability of search engines to inform their users about historical events in a comprehensive manner, particularly considering that the logic between their prioritization of specific aspects of the past is unclear.
These concerns are amplified by the instances of search malperformance that we observed. The most noticeable were search outputs promoting revisionism, trivialization, and antisemitism. While the number of such outputs was low, and they were mostly confined to Russian query results, their very presence is concerning. Not only does it misinform the public and attack the dignity of Holocaust victims, it also confirms these views by promoting them via Web services that are trusted by users (Pan, et al., 2007).
The occurrence of these forms of inappropriate content also calls into question the "data void" (Nguyen, 2020) argument used to justify erroneous search outputs. Unlike niche queries, for which the retrieval of irrelevant or offensive content can potentially be explained by the lack of appropriate outputs, this is not the case with the "Holocaust" query in either English or Russian. This observation suggests that, in the case of both the Holocaust and other cases, malperformance can be attributed to the algorithm itself and not to the lack of data.
This suggestion finds support in earlier research criticizing the image search mechanisms used by search engines for relying not on semantic analysis of the image itself, but on the presence or absence of specific text terms in its vicinity (Cui, et al., 2008;Etzioni, et al., 2007). Such implementation can be viewed as counterintuitive to the universal nature of visual content (Etzioni, et al., 2007) and also limits the number of potential outputs in response to queries in languages for which less Web content is available, particularly as it is often difficult to translate ontologies into different languages (Embley, et al., 2011). The latter factor could also explain the higher degree of malperformance for the Russian query, which we observed in several search engines.
Instances of malperformance associated with misattribution and overrepresentation were less obvious, but they had profound consequences for Holocaust representation. A number of misattributed images mixed evidence of the Holocaust with other episodes of the Second World War, as well as postwar atrocities. This undermines the authenticity of Holocaust representation and can distort the way the public perceives the event.
The overrepresentation of a few Holocaust themes and sites can have a similarly distortive effect by simplifying the complex nature of the event and reiterating its stereotypical representation. Such systematic malperformance, which can be viewed as a form of bias similar to the skewed representation of gender and race (Noble, 2018) by search engines, raises ethical concerns, particularly as it seems to be related to the commodification of Holocaust memory.
These observations raise the question of what can be done to counter these instances of malperformance. Some of them (in particular, revisionism, trivialization, and antisemitism) can be addressed by more robust filtering mechanisms, both to filter out "junk" (Bradshaw, 2019) sources and inappropriate images such as memes. While memes can be legitimate search outputs for many queries, they are not the most fitting option for information requests dealing with mass atrocities, particularly when users do not explicitly search for atrocitiesrelated entertainment content.
The implementation of such filtering mechanisms is not a trivial task, considering the constantly changing nature of inappropriate content. This problem is not unique for Holocaust-related content and, similar to the different forms of hate speech and misinformation, requires constant monitoring and updating of content curation systems. Because of its complexity, the implementation of the task might require the deployment of additional oversight-centered algorithmic systems (e.g., artificial intelligence [AI] guardians; Etzioni and Etzioni, 2017) to ensure that the performance of curation systems stays within a certain set of parameters, such as the omission of denialist content in the top search outputs.
Similar to the way search engines shifted toward prioritizing authoritative health information sources during COVID-19 , alternative media promoting Holocaust denialism (in particular, in Russian) can be demoted in favor of established heritage institutions, such as Yad Vashem or the U.S. Holocaust Memorial Museum. While it might increase the systematic overrepresentation of certain aspects of the event, which can be present in the content produced by these institutions, it will still allow countering of the most obtrusive forms of malperformance in relation to the Holocaust.
In addition to prioritizing more authoritative sources of information, recent developments in the field of content-based image retrieval can facilitate the adoption of new forms of image search, relying on semantic principles instead of visual similarity or keyword matches in the text accompanying images (Barz and Denzler, 2019;Zhou, et al., 2017). Together with earlier work on multilingual ontology construction (Embley, et al., 2011;Etzioni, et al., 2007), these developments can improve the quality of the algorithmic curation of visual information in response to both English and non-English queries.
Besides filtering out non-authoritative sources, the increasing availability of authentic historical content (e.g., by digitizing museum collections), and advanced mechanisms of visual information retrieval can also address misattribution-related malperformance. However, it would not necessarily address overrepresentation, particularly considering the unequal distribution of authentic content itself, as well as the varying degrees of its popularization.
One potential solution could be to adapt to historical content the diversity metrics that are already used by search engines (Zheng, et al., 2017). To diversify its outputs, Google does not include more than one output from the same domain in its top search (Schwartz, 2019). Similar logic can be used, for instance, to show no more than one image from a specific Holocaust site or related to a certain Holocaust theme in the top outputs.
Implementing such a solution is a challenging technical task because it requires reliable metadata that can be used for diversification. It also raises ethical questions related to the normative role of algorithmic curation of information, such as whether a more equal representation of suffering is more desirable than the current focus on a few sites and themes embedded in popular culture and the tourist industry.
The latter point also emphasizes the importance of considering the ethical aspects of algorithmic curation mechanisms that deal with historical information. Similar to other areas, where the deployment of algorithm-driven systems is subjected to increased scrutiny and calls for regulation (Etzioni and Etzioni, 2016;Etzioni, 2018;Helberger, et al., 2020), the possibility of search malperformance leading to unfair representation of the traumatic past has to be recognized and addressed by intensifying the dialogue between heritage institutions and industry. Such dialogue is essential for improving the ways algorithms curate historical content and can potentially encourage more memory-sensitive design of curation and oversight mechanisms.
It is also important to mention the limitations of this study. First, it relies on data collected during a single experiment, whereas a more longitudinal approach is required to validate the consistency of the findings. Second, because of the post hoc extraction of search results, some of them became unavailable. Third, in a few cases, it was not possible to reliably identify whether the image was related to the Holocaust, which decreased the sample used for analysis.
Finally, for this study, we relied on a single search term -that is, "Holocaust" -in two languages, whereas there are a number of synonyms (e.g., "Shoa") and related terms (e.g., "ghetto" or "einsatzgruppen") or personalities (e.g., "Anne Frank" or "Adolf Eichmann") that are important in the context of algorithmic curation of Holocaust-related content. In future research, we aim to expand the selection of search queries to integrate these terms and compare whether search outputs returned in response to them are different from those retrieved to general queries such as "Holocaust." Despite these limitations, our study highlights the urgent need to expand the research on algorithmic curation to historical information. The fact that the visual representation of the Holocaust, the memory of which is protected by legislative mechanisms, is subject to malperformance is concerning. It stresses the importance of further research on the representation of the past by search engines, as well as the more active involvement of memory scholars and curators in the ongoing debate about algorithmic fairness and diversity.