Cultures of Digitization: A Historiographic Perspective on Digital Art History

Art museums began using computers to help organize, catalogue, and coordinate their collections as early as the 1960s. In more recent times, art historians have consolidated the use of digital tools in the discipline within the emerging ﬁ eld of Digital Art History (DAH). In this historiographic study, we set out to understand DAH through an analysis of existing scholarship in the ﬁ eld. Our method combined both text mining and close reading of three datasets of art history journal articles published in the last decade: DAH ( International Journal of Digital Art History , special issues of Visual Resources ), Art History , and Art Journal . We studied the topical focus of these journals, looking at which agents, materials, and methods dominate and how they are contextualized. Based on this, we found that the subject matter and topical focus of scholarship in DAH differs signi ﬁ cantly from scholarship in Art History or Art Journal . More speci ﬁ cally, the historical concerns of museums with regard to digitization still dominate DAH compared to other scholarship in the ﬁ eld. We argue that there are a number of historical and practical reasons for this, including early adoption of computers within museums, the need for simplicity in digitization projects, and issues of copyright. The persistence of this af ﬁ liation, in turn, raises critical questions for the future of the ﬁ eld of art history, including who can access art historical datasets, and how and by whom they are created.

In this article, we set out to understand and map the emerging field of DAH through an analysis of existing scholarship from the last decade. What are the major concerns of DAH? Who are its main agents, what is its material focus, and how does it approach art? Up to now, DAH methods and concerns have not been integrated into the mainstream of the field of art history. We therefore also wanted to understand what the relationship is between the concerns of DAH and other art historical scholarship today. In order to answer these questions, we performed a text mining analysis to uncover key themes and focal points in both DAH and "mainstream" art history literature. In short, this is a historiographic study of the emerging field of DAH performed with the help of digital tools. Drawing from the word frequency lists generated by text mining, we investigated the specific contexts of these words and their use through close reading of the articles in our datasets. We found that the subject matter and topical focus of scholarship in DAH differs significantly from scholarship in Art History or Art Journal. More specifically, the historical concerns of museums with regard to digitization still dominate DAH compared to other scholarship in the field. We argue that there are a number of historical and practical reasons for this, including early adoption of computers within museums, the need for simplicity in digitization projects, and issues of copyright. The persistence of this affiliation, in turn, raises critical questions for the future of the field of art history, including who can access art historical datasets, and how and by whom they are created.

Quantifying and Contextualizing Digital Art History
For this study we combined digital and non-digital methods: text mining and close reading. Our initial step was to prepare three datasets of art history articles and perform a text mining operation on these datasets. 3 We first assembled a dataset of DAH journal articles that represent the concerns and development of the field over the last 10 years. This dataset includes all of the articles published in the International Journal of Digital Art History (DAHJ) between 2015 and 2019 and two special issues of Visual Resources on the topic of DAH published in 2013 and 2019. In total, there were 63 articles in our DAH dataset. We chose these examples and excluded older work that combines computational methods with art historical subject matter because we wanted to understand DAH-branded research in particular, as an offshoot of DH scholarship. While publications are still limited in this area, we felt that these were some of the most often-cited and clearly defined exemplars.
For the other two datasets, we selected two journals that we would characterize as publishing "mainstream" art history research: Art Journal and Art History. These two journals represent the broad scope of the field of art history. Publications in Art History are not limited by time period or geographical area, and Art Journal focuses primarily on modern and contemporary art and theory, often with a global perspective. We chose to compare DAH to these two journals, as they represent a diverse view of art historical scholarship. They are also representative of two of the largest scholarly organizations for the field: the Association for Art History (AAH) in the UK, which publishes Art History, and the College Art Association (CAA) in the USA, which publishes Art Journal. We collected all of the articles published in these journals between 2010 and 2019 into two separate datasets. 4 The Art History dataset contained 373 articles and the Art Journal dataset included 215 articles. We chose to maintain the disparity in number of articles in order to preserve, as much as possible, a collection that spanned the decade.
The major challenge for the quantitative portion of our methodology was converting these articles, which are available only in PDF format, to machine-readable text for text mining. While many other scholarly fields have a standard format for articles across different journals, art history journals have a wide range of different formats for both text and citations that vary even among different decades of publication for one journal. Some articles have footnotes along the side margins, some have them at the bottom of each page, and others at the end of the article. Likewise, publications vary in their organization of text into one or two columns. Furthermore, art history journals rarely publish a bibliography at the end of the text, but instead prefer longform footnotes.
In order to extract text and citations from PDFs, we used an open-source Java protocol called CERMINE (Content ExtRactor and MINEr). 5 While this protocol was developed to extract structured text from scientific articles with a reference list at the end of the article, structuring citation content by author, title, and other reference components, it satisfactorily demarcated body text and whole citationsif not their component partsfor our art history articles. We used the resultant XML to extract plain text of either the body text or the citations. Once we had done this, our datasets were ready for text mining. We chose to use a standard text mining method using the text mining (tm package) in R. Our methods for PDF extraction and text mining are explained in greater detail in a separate publication and our full datasets are available online. 6 We used two different techniques to determine the top 100 most frequently mentioned words in each dataset, minus stopwords. The first (Method A) analyzed all the text of each dataset combined together, and the second (Method B) produced a list of words that were mentioned in the highest number of articles in each dataset. In other words, for Method A, if only one article mentioned the word "technologist" many timesan actual example from our sampleall of those instances were counted, so the results seem to show that the word "technologist" is far more frequent across the dataset than it actually is. 7 However, for Method B, an article was counted as 1 no matter whether it contained 1 or 200 mentions of the word "technologist" and words were ranked according to number of articles containing them. Using both of these techniques helped us understand the distribution of word frequency across datasets and allowed us to disregard extremes, like the example above, and better pinpoint what quantitative frequency might mean in the context of the material. In this article we primarily reference the combined total word list. However, we have cross-checked this with both the lists produced by Method B and close reading of the texts in the sample.
The results of the text mining gave us a sense of the rate at which different agents, methods, and materials are mentioned in a given article and a given dataset, but they did not give us any sense of why they appear so often or in what context they appear.
In order to address these questions, we performed a close reading of the articles in question. We could not have ascertained the results of the text mining through close reading, nor could the "distant reading" of text mining have answered any deeper questions about what the presence and prevalence of certain words mean. 8 Thus, we do not simply want to confirm possible disproportion in numbers but rather discuss why these biases exist. In effect, we seek to detail some of the historical, institutional, and organizational factors which may explain the restrainedboth in quantity and diversityuse of computational methods within the discipline of art history. This historiographic approach, which seeks to understand both the historical and contextual origin of DAH and its position in the discipline of art history at present, combines digital and analogue methods, distant and close readings. In sum, this is a case study in how mixed methods can complement each other for art historiography and art history research more broadly to delineate disciplinary discourses as expressed in agents, methods, and materials studied.

Text Mining Art Historical Scholarship
Using the top word lists we produced through text mining as a guide, it was possible to discern patterns regarding the topical focus of the articles. Table 1 lists a ranking of the top 25 words using Method A, and Figure 1 is a representation of the overlap of the top 100 words in each dataset compared using a Venn diagram. These lists of nouns We can see, first of all, that nouns and verbs related to computers and digital or computational methods were more frequent in DAH than in Art Journal and Art History. This is to be expected, of course. Among the top 100 words, unique to the DAH sample, were, for example, "compute," "algorithm," "metric," "dataset," and "cluster." Another evident contrast was between which agents were most often cited. 9 High on the list for Art Journal are "artist," "critic," and "audience," while Art History cites "artist," "painter," and "viewer." DAH on the other hand cites "historian," "scholar," and "human" more often. The latter is evidently due to contrasts made between machine computation and manual analysis. These differences suggest that DAH scholarship is primarily concerned with methodological and practical considerations that concern researchers rather than discussions of artists or reaction to their work from critics and viewers/audience.
A typical example of this in the DAH dataset is an article describing the features of a database of the work of Caravaggio and his followers at the Bibliotheca Hertziana -Max Planck Institute for Art History. Marco Cardinali discloses that the database "covers the entire context of Caravaggism" and thereby "provide[s] a broader study context" including information on "patrons, social contexts, documents, iconographic sources, and the fortune of the works themselves." The same database is also said to include "multispectral images" such as X-rays and infrared images in addition to the "scientific data." As is evident, this text is not about the artworks or artists per se but rather about the infrastructure for the scholarly workor, more precisely, a very particular type of scholarly work attuned to technical aspects of the artwork. The focus is on what information or data is provided and how it may be accessed. 10 Mentions of Caravaggio in Art History or Art Journal, by contrast, typically focus on the content and meaning of the artwork or the artist's life and philosophical outlook. For example, in an article on mirrors in Renaissance art, Genevieve Warwick points to Caravaggio's work which "stand[s] as an emblem of this painter's art of realism, advancing the claim of this artist who famously purported to 'paint what I see'." 11 In this short passage, both the content of the artwork and the voice of the artist are brought out. In Art Journal, Caravaggio is only mentioned in one article, which cites him as one of many "LGBTQ practitioners" in the history of art as well as one of many artists who "engaged with homoeroticism or negotiated homophobia in their work." In this case, the focus is on the artist's persona and how that is expressed in the work. 12 Both of these examples are typical in that they concern artists and their work.
Some of the other frequently used words produced by our text mining methods indicate who or what the articles were concerned with studyingthat is, the research material. In Art History and Art Journal, "paint(ing)" was the most frequently used word that indicated an object of study. 13 "Photograph," "sculpture," and "exhibit" also ranked among the top 10 words. While paint(ing) was also among the most common words in DAH, none of the other objects of study shared by Art History and Art Journal were among the top 10 in DAH.
Much of DAH scholarship seems to equate art with traditional Western painting and rarely addresses other artistic media. Architecture, craft, and sculpture are mentioned in discussions of 3D reconstruction techniques, but the DAH dataset in this study overwhelmingly revolves around painting ( Figure 2). By selecting articles that are specifically labeled as DAH, the broader field of digital humanities research, including archaeology and heritage studies, where 3D reconstruction is pivotal, are somewhat excluded from the sample. 14 Simultaneously, installation art, performance art, video art, and photo-based art, to give some examples of media in contemporary art, are rare to non-existent within DAH. 15 This is understandable, given that two-dimensional works can be captured in a more comprehensive, unitary way as digital images in large datasets than, for instance, installations or performances. Nevertheless, by not acknowledging how digital tools and methods privilege two-dimensional paintings over other media, DAH scholarship disregards the broad scope of art historical study.
After doing a close reading of some of the articles in the sample to ascertain the context of the most frequently used words, we found some key differences that the frequency lists alone do not indicate. For example, the word "viewer" appeared in all three datasets but it was used in a very different way in DAH compared to Art History and Art Journal. Although "viewer" is not among the top 100 frequently used words in the DAH sample, it can be found in some of the articles. We found that, when it appears, it does not typically refer to the viewer of an artwork, as is the case in Art History and Art Journal, but rather the viewer of a data visualization or a collection navigation interface. For example, one article states, "The viewer can examine sketches in the same year by performing vertical movements and shift in time between different year columns by moving horizontally." 16 Another concludes, "The effect of sorting on direct visualization is so important, in fact, that any feature not used for sorting is essentially invisible to the viewer." 17 This example demonstrates that the focus of DAH is on the experience of a meta-layer of interpretive output rather than the works of art themselves.
In other words, DAH literature is often not concerned with individual artworks but rather with descriptions of digital tools and methodologies (Figures 3 and 4). This means that very little of DAH scholarship provides the kind of analysis found in mainstream art historical journals. One might assume that, given time, there would be a shift from discussion of tools/methods to applying them in the analysis of artworks, producing new insights of art historical import. However, looking back at the history of publications in this area from CHArt to DAH over the past 30 years, it seems that writing on the use of computers in art history has not moved much beyond description of new methods or tools. The focus is typically on sorting and presenting art images in institutional and educational contexts rather than reflecting on or engaging with individual artworks themselves. It is a detached, distant viewpoint, where artworks and artists are treated as objects for management and are assessed as aggregates rather than individual objects for interpretation and perceptual experience.

The Collection as Point of Reference
Another difference between the datasets was that words associated with museum and collection management were more common in DAH than in Art History and Art Journal. Words like "catalogue," "conserve," "classify," "access," "dataset," and "metadata" as well as the word "museum" itself are among the top 100 words in DAH, but none of these words appears in the top 100 word lists of Art History and Art Journal. Thus, there seem to be strong ties between the scholarly writings within the field of DAH and the museum sector compared to Art Journal or Art History. Yet even more interesting to see, zooming into the articles themselves, is how and in what context the word "museum" occurs. 18 In DAHJ and the special issues of Visual Resources, the word "museum" is used in conjunction with the insider's perspective of the day-to-day business of the museum and collecting institutions. Frequent word combinations are museum -curators, -catalogues, -cataloguer, -record, -collection. A case in point is an article in DAHJ by a curator at the William College Museum of Art. The overall aim of the article is to discuss digital tools and queer theory in relation to curatorial practices. However, the arguments regarding benefits and use of these tools are directed toward the museum personnel, which is the recurring "we" throughout the text. Horace D. Ballard writes, "We have input all our accession data into a visualization module that allows our curatorial and engagement teams immediate access to which areas of our collection have been growing and by what rates" and continues, " … we've been using our collection as a discrete data set that, with the help of various digital tools, we could mobilize, quantify, and visualize into a digital infrastructure to share with our students and with the world." 19 These mentions of museum and museum practices are linked to the hands-on practices within the context of institutional and organizational practices in the museum.
In addition to texts that discuss the organization of collections and issues around access, articles in the DAH sample also frequently comment on exhibition practices, i.e., how digital techniques can be used to display museum collections in situ and online. In other words, DAH writing on museums is far more practical in its focus. However, the viewer reception, or interpretation of particular exhibitions in museums, which are popular topics in Art Journal and Art History, are not frequently addressed in the DAH literature. While words implying art's contact points outside the museum such as "exhibit," "audience," "viewer," and "critic" are on the top 100 list of both Art History and Art Journal, none of these words is on the top 100 word list of DAH.
It is also apparent, when comparing the mentions of museums in each corpus, that a large proportion of the writers/authors in DAH are associated with, or employees of, museums and archives rather than instructors or professors at universities, as is the case in the other two samples. In the list of institutions that frequently appear in DAH, the Gettyincluding the Getty Research Institute, Getty Foundation, J. Paul Getty Museumtruly stands out. "Getty" is the only named institution among the top 100 words in DAH, which is interesting given that no single museum or institution is named on the top 100 list in the two mainstream art history samples. Getty has played a central role in the field of DAH and digital humanities research in general due to its efforts in developing standards for cataloguing, metadata, and so on. In the articles we consulted, Getty is mentioned primarily as a producer or collaborative research partner (Getty Museum, the Getty Research Institute, J. Paul Getty Trust, Getty Conservation Institute), yet also as a funder or enabler of research (Getty Foundation, Getty Art History Information Program -AHIP), and as an information source and publisher (online databases and tutorials on getty.edu and vocabularies like ULAN and Art & Architecture Thesaurus).
Although "museum" is not included among the top 100 words in the body text of the articles in our mainstream art history samples, the word can most readily be found in image credits and citations for the articles, which were not included in the text corpuses we used. While it may seem obvious that museums would be listed in the credits for images reproduced in an art history journal, it is important to note because it demonstrates another key difference between mainstream art history and DAH literature. While art history journals typically include reproductions of artworks or exhibitions, DAH articles often include data visualizations instead.
Museum practices such as cataloguing, classifying, and displaying artifacts are addressed in Art History and Art Journal, but they are typically discussed in the context of an artist's or external scholar's critical engagement with the museum system. There is a recognition of the ways in which such practices entail biases and assumptions on the part of those who carry them out. Where discussions of museums appear in the body text of Art History and Art Journal, they are often referred to as historical and ideological-constructed institutions or agents. The museum as an institution is, for example, discussed as an "interpretative frame," "institutional framework," or an "epistemological structure" that, for example, can steer the reception of how an artifact or artwork is perceived. 20 Both the benefits and drawbacks of this formative capacity of the museumhow museums shape meaningare highlighted in articles that appear in Art History and Art Journal.
The museum's unique potential to showcase and display particular types of art is just one topic of discussion in these mainstream journals. 21 More commonly, however, the museum as a historical and institutional construct, upholding "problematic" canons and categorizations, is the subject of critical analysis. One pressing topic, for example, is the origin of objects, a central concept in the organization of museum displays and catalogues. Among the articles in our Art Journal sample is a review of an exhibition at Nationalgalerie in Berlin in 2010, in which art historian Chika Okeke-Agulu writes: what precisely constituted German art and the museum's relationship to it was never a settled question, given that its collection included classical Greek and Roman art, and the work of French Impressionist and other modern European artists. This disconnect in status between the museum as a nationalist emblem and as a repository of hallmarks of (modern) Western art inevitably led, during the Nazi years, to an ideological and discursive logjam on the question of the place of art and avant-garde practices in national imaginaries. 22 Articles such as this question not only the historical construction of the museum but the way such constructions reflect the ideologies of society at large.
A similar example can be found in a discussion of the Priuli wine cup in the collection of the Victoria & Albert Museum in Art History ( Figure 5). This Venetian cup, displayed in the European galleries of the museum, is simultaneously listed in the museum's online records as from "Syria (possibly, made); Damascus, Syria (probably, decorated); Egypt (possibly, made)." Elizabeth Rodini writes that this type of hybrid object is common among museum collections. As a consequence, she argues, the "geographic paradigm of the museum as the dominant interpretative frame" needs to be replaced as "the effort to parse out and pin-point provenance, or to declare something an authentic import or a local imitation, perhaps in order to locate it in the right museum gallery, steers scholars away from some key contemporary considerations." 23 The core conceits of the museum as a reflection of a particular society's political and ideological biases are not similarly questioned in DAH writings. On the contrary, the museum and professional collection of artworks are the givenand neutralstarting point for much DAH writing.

Art Collections as Datasets
On the basis of our distant and close reading of writings in Digital Art History between 2013 and 2019, we contend that DAH combines what can be called a managerial approach to artworks and artists and is based predominantly in a museum and collection context. Given the long history of using digital tools and methods in these institutional settings, this is not unexpected. While art history scholars like other humanities scholars are sometimes accused of being dilatory in the uptake and development of digital methods and tools, the museum sector was in fact an early adopter. Initiatives to implement computers in museum collections, such as the Museum Computer Network (US) and the Museum Group (UK), were formed as early as the mid-1960s, and a landmark conference was held in 1968, co-sponsored by IBM and the Metropolitan Museum of Art in New York, that explored the potential applications of computer technology in museums. 24 At that time, the primary concern for museum professionals was the need for clear, simple, and reliable sorting and retrieval methods to maintain control over their growing collections and catalogues. Using computers to digitize museum records and other textual material provided a welcome solution to the deluge of institutional information. By the 1980s, computers were much more affordable and accessible for individual researchers. In light of this development, a group of researchers in the UK founded the organization Computers and Art History Group (CHArt) in October 1985. Their explicit goal was "to show that it was perfectly feasible to use modern computer technology to carry out individual research projects." 25 As evidenced by these early conferences and organizations, art historians and museums took an active interest in computers as soon as the technology was available to them. 26 Although CHArt was first organized by art historians at Birkbeck University, representative of museums like the National Gallery and the British Museum were part of the core organization from early on. 27 The articles published in the CHArt journal show that the museumspecifically museum collections and managementwas central to the journal and the organization ( Figure 6). Most of the articles address the operation of museum and institutional collections, even though CHArt was initiated in a university context. The idea of working with digitized images at Birkbeck stemmed from the deficiencies of available color slides, which did not provide the accurate reproduction of color or sharpness required for teaching and research purposes. 28 During its years in publication, however, museum-related issues appear to have dominated CHArt.
As our distant and close reading of DAH scholarship in the 2010s shows, art historians' use of digital methods and tools continues to be closely tied to museological practice and the concerns of collecting institutions. Typically, DAH encompasses both digitization projects, like those that began in the late 1960s by institutions, and individual research projects using computational methods, such as those pioneered by CHArt members. In digital humanities today, computational methods include many different techniques for automatic text and image sorting/analysis, such as pattern and image recognition, text mining and network analysis. Consequently, contemporary digital tools are not solely used to sort and retrieve items in collections but also to visualize collection contents for museum audiences and scholars. They are used to find patterns as regards exhibition practices, diversity in representation of artist's gender and nationality in collections, and so on. 29 The museum, as an institution, is a historical product of the general encyclopedic mindset of the nineteenth century, when scholars, state representatives, and museums alike strove to map, collect, and categorize the entire world. 30 As pointed out by art historian Michael Ann Holly, the nature of early art historical scholarship was a reflection of museum practices during the same era: To a large extent, the development of art historical scholarship in the nineteenth century was determined by the organization of museums.
[…] At the turn of the century, for example, galleries and museums frequently identified their holdings only by the names of the artists and pertinent dates, seemingly avoiding any information extrinsic to the experience of art qua art. Titles, which by definition overtly signify subject matter, were most frequently omitted. 31 The historical roots of the museum as a cultural phenomenon and the historical roots of DAH scholarship in museum practices may thus explain the strong focus in DAH to collect, map, and manage large datasets. Accordingly, DAH can be described as a restoration of nineteenth-century art historical scholarship, as outlined above by Holly.
In effect, the collections that were constituted in the nineteenth century, along with their organization by categories like period, style, and artist, are the same collections being used in computational analysis. Large collections organized according to fixed categories such as these are highly compatible with digital methods. It is well established that many digitization projects are based on "simplicity," i.e. easily accessible, large, pre-defined collections of related items are typically digitized first. 32 The extensive analogue catalogues that have historically been produced within a single museum or research institute are already labeled with information/metadata and can therefore more easily be migrated to digital counterparts than, for example, contemporary art scattered over several museums, private archives and galleries, or broader visual culture material which is not under the care of one but many, if any, collecting institutions. Examples such as these are much more complicated to digitize and therefore to produce usable datasets for art historical scholarship. In practice, this means that collections of the old masters of art history, whose artworks are well documented and clearly deliminated, have been low-hanging fruit for digitization projects. Their datasets are clear-cut compared to the works and images of less wellknown artists and image producers, whose work is less documented, less well-organized, and which may even lack basic information such as attribution and year of production. In sum, the digitized art collections currently available for computational analysis tend to consist of objects that are already held by and demarcated as datasets by museums. It goes without saying that this bias in digitization risks reinforcing the canons of art history. 33 Another bias in DAH datasets is the skew toward pre-modern art, which may stem from issues of copyright. As put forward by Victoria Szabo, "the question of copyright and ownership of the data being considered is [a] crucial consideration for our collective endeavors in bringing database analytics to art historical research." 34 Copyright issues are an obstacle for all types of computational analysis. Yet for art history, with its strong reliance on images and artworks, which are even more heavily circumscribed by copyright holderswhether they are based in museums or elsewhereit is a serious impediment for scholarly work. 35 As the above analysis shows, there is a correlation between the production of digital repositories of art historical visual data and the development and use of computational methods for producing art historical scholarly work. If digitization of art historical data is produced within one particular type of collecting institution (i.e. museums), the needs and interests of those institutions will influence what is digitized and how that digitization process is designed.

Cultures of Digitization and the Future of Art History
Although we cannot determine cause and effect as regards the current datasets in the emerging field of digital art history using the text mining methods above, there is conceivably a historical and ideological connection between digital resources available and the institutional setting of digitization. Following this, we would like to return to Johanna Drucker's definition of digital art history as "analytic techniques enabled by computational technology," which does not include the use of online repositories and images (what she terms "digitized" art history). 36 However, we argue that there is a relationship between what is digitized and the methods and tools that are used to work on these digital datasets. This means that what can be done with the aid of computational methods is predetermined, often steered by what has already been digitized, how it has been digitized, and how that data is made available for research. Digital Art History is an effect of Digitized Art History and vice versa. In other words, no raw data exists as such; the data is always created. It is capta, a term Drucker uses in place of "data" to emphasize that data is actually constructed rather than "given." 37 As our close and distant reading of art historical literature has demonstrated, there is a strong affiliation between museums and digital methods/digitization projects within the field of art history. As we have suggested above, there are a number of historical and practical reasons for this affiliation, including early adoption of computers within the museum, the need for simplicity in digitization projects, and issues of copyright. Consequently, the persistence of this affiliation raises critical questions for the future of the field of art history.
One urgent question that arises in relation to the use of computational methods and digital tools is, who can access these datasets? The issue is not that large parts of the cultural heritage collections are still not digitized and made available digitally, as is pointed out recurrently. 38 The question is rather how they are made available. While the content of datasets appears to be shaped by the current practices, needs, and historical traditions of collecting institutions, full access to these datasets still remains, to a large extent, confined within them. Often, partial versions of these internal databases appear on portals like Europeana and Flickr, but full access to all of the data for collection items is restricted to museum professionals and scholars employed in the museums. 39 There are currently initiatives to create linked open data, data standards, and online repositories to increase access to art collections outside the boundaries of individual institutions. 40 However, the majority of art historical data is still governed by collecting institutions. For the discipline of art history, it is therefore necessary to add nuance to the notion of digital access. The degree of digitization in collectionswhether it is 5% or 100% of the holdingsis not the principal question; the crucial question is rather how, by whom, and for what purpose these artifacts have been digitized. It is therefore imperative to discuss historical and contemporary cultures of digitization. This paradata is crucial for situating the datasets historically, organizationally, and politically and for understanding the paradigm of knowledge upon which they are based. 41 Another urgent concern for the development and use of computational methods and digital tools within art history is the question of who is producing vital datasets for the discipline. As our text mining shows, there seems to be a gap or a disconnect in terms of scholarly interests between those who develop and use digital tools for art historical studies (DAH) and those who do not (Art History, Art Journal). The crucial issue for the discipline is therefore to open up a discussion on how datasets for art historical research could or should be produced and governed in order to fulfill the needs of a greater portion of art historical scholars. Art historian Paul Jaskot has rightfully pointed out that "art history has institutional and market restrictions that make it structurally impossible for every subject, every position, and every methodology to be present and accounted for equally." 42 However, in the context of the science-inspired technical language often found in DAH literature, we need to constantly remind ourselves that no datasets are actually "given." Moreover, we need to address how the less celebrated or well-documented artworks in museums and other collecting institutions may be digitized and subsequently studied through digital methods. As long as biases exist in what is digitized and how it is digitized, the majority of the agents, materials, and issues in the discipline will have to be tackled without the benefit of computation methods and big data. Indirectly this may explain why databases and digital tools have not yet attracted art historical scholars more broadly.
In sum, we contend that the most pressing issue for art history in the wake of digital resources and tools is not the "technical roots in the digital tools" as put forward recently, but rather their historical, conceptual, and institutional roots and the biases contained therein. 43 The digital turn in art history cannot be understood solely as a question of methods or tools, whether new or old. It is also about mindsets, research paradigms, and the historical traditions of the discipline itself. 44 Digital resources are, of course, extremely valuable to the discipline of art history. In many ways, they make the work we do as art historians faster and easier, but in order to be truly relevant to the current state of the discipline, they must be developed with an eye to the diversity that continues to grow in the discipline at large.
In a recent stock-taking of the field, Jaskot asks, "What are the critical questions in art history that demand and are best suited to specific digital methods?" 45 One answer to this call, which we believe the present study shows, is that we continually need to remind ourselves that our approaches in art historical research, whether they are analogue or digital, are produced in a cultural context and are neither neutral nor have any inherent use. In short, we need to scrutinize the mindsets behind our datasets.