Digital Humanities in the Memory Institution: The Challenges of Encoding Sir Hans Sloane’s Early Modern Catalogues of His Collections

Catalogues are the core documents of museum structure and meaning. Yet no significant computational analysis has been made of how catalogues from the early modern period are constructed or of the way their structure and content relate to the world from which collections are assembled. The Leverhulme-funded ‘Enlightenment Architectures: Sir Hans Sloane’s catalogues of his collections’ (2016–19), a collaboration between the British Museum and University College London, with contributing expertise from the British Library and the Natural History Museum, seeks to change this. The Enlightenment Architectures project is analysing Sloane’s original manuscript catalogues of his collections to understand their highly complex information architecture and intellectual legacies. In this article we explore some of the challenges of seeking to integrate the methods of digital humanities with those of cataloguing, inventory, curatorial and historical studies and of bringing such interdisciplinary approaches to bear on early modern documentary sources. We do this through two case studies that highlight the approaches to encoding Sloane’s catalogues in TEI that Enlightenment Architectures has employed and the major challenges that these have brought to the fore.


Introduction
What are the challenges of seeking to integrate the methods of digital humanities with those of cataloguing, inventory, curatorial and historical studies and of bringing such interdisciplinary approaches to bear on early modern documentary sources? What kinds of productive collisions and misalignments occur when the aforementioned fields' understandings of early modern documents and information meet the technical demands of present-day computational modelling? What barriers must be overcome in order to use the methods of the digital humanities to develop new understandings of the early modern period? We will explore these questions by drawing on our Sloane's manuscript catalogues are 'paper tools' that allowed him and his amanuenses to classify, cross-reference and document his collections and library.
They were also instruments through which Enlightenment knowledge was produced and circulated, and reifications of how Sloane and others understood the world in the early modern period. The Enlightenment Architectures project is analysing Sloane's original manuscript catalogues of his collections to understand their highly complex information architecture and intellectual legacies. We place particular focus on the informational units of which the catalogues are composed and wish to understand the structural relations between these informational units. To support this, we are encoding, or marking up, Sloane's catalogues in line with the Guidelines of the Text Encoding Initiative (TEI). The TEI Guidelines set out XML-based encoding methods for making texts of the humanities, social sciences and linguistics machine readable. TEI is a de-facto standard (Jannidis, 2009: 258). It has been described as being among the 'most significant intellectual advances that have been made in [digital humanities] and [TEI] has influenced the markup community as a whole' (Hockey, 2004: 16).
In recent years, a number of projects with strong digital humanities elements have focused on catalogue descriptions of manuscripts, for example, FIHRIST (Union Catalogue of Manuscripts from the Islamicate World) (FIHRIST, n.d.). Evolving out of the pilot Islamic Manuscript Catalogue on-line (OCIMCO, n.d.), which built ' a sustainable data format using a tailored schema for the open source TEI/XML metadata standard and incorporating established library standards for description', FIHRIST is now a UK-wide union catalogue, whose schema is open for use by other TEI catalogues (FIHRIST, n.d.). Important work has also focused on harvesting bibliographical information about geographically dispersed manuscripts and federating this information in new catalogues and databases. Manuscriptorium, for instance, is a freely accessible digital library of manuscripts, old printed books and other documents. The catalogue assembles descriptive metadata about these works in XML and directs users to their respective complex digital documents (CDD) (Manuscriptorium, n.d.).
Though Enlightenment Architectures seeks to contribute to the conversations opened by such projects, it differs from them fundamentally. In this project we do not view the creation of a new digital representation of Sloane's catalogues as an end in itself; rather, our focus is on identifying and analysing the information architectures of Sloane's catalogues and his, and his amanuenses', cataloguing practices. We are modelling the catalogues and encoding them in TEI in order to study this. Our focus is therefore as much on the act of modelling as it is on the resulting computational model, and we view Sloane's catalogues as 'bifocal data'-a window frame that we must concurrently look ' at ' and 'through' (Sperberg-McQueen, 2018). This requires us, as far as it is possible, to privilege a historically-accurate representation of the informational entities of Sloane's catalogues over achieving conformance with the views of information that are implicit in 21st-century encoding specifications like TEI. As a result, we believe that the difficulties that we are encountering are productive and meaningful and that our work is casting new light on epistemologies This article discusses the specific example of Sloane's early modern catalogues and the challenges that we have encountered when seeking to use TEI to encode this material. Our work nevertheless has resonance for the institutions, individuals and communities across the globe who manage, research, curate, archive and simply even browse the many and extensive digital heritage collections that are available online. 1 Poole (2016)  Simple digitisation and publication of unstructured text is rarely sufficient for allowing online collections to be used and transformed in the ways suggested above (Stork et al., 2018). Rather, it is necessary to make machine readable information that a computer usually cannot decipher unaided. This can include information in and about a digitised object; for example, that the string of letters 'red' in a catalogue is actually a colour name or that a given catalogue was written by Hans Sloane. It is necessary to do this to support the sophisticated search, interlinking, remixing and other actions that collections online can ideally support. Languages like the XMLbased TEI, which is the focus of this article, are thus crucially important pillars of digital collections because they can: [make] it possible for people to embed additional knowledge in the text, including interpretative material. The purpose of text tagging is to facilitate retrieval and representation through applying what is essentially a controlled vocabulary of tags. A collection with an interpretative level of tagging is one where information is included in the tags that is otherwise not available in the text. (Ruecker, Radzikowska & Sinclair, 2016: 111) Thus, a reader may ask 'why is TEI important? Why is it important to understand the benefits and complexities of applying TEI to early modern catalogues like that of Sloane?'. Our response is that TEI plays a crucial role in allowing sophisticated research questions to be asked of Sloane's catalogues and, in turn, it shapes the extent to which Sloane's catalogues can be intermeshed with the wider digital cultural heritage ecosystem that is discussed above.
Through the following case studies, we will discuss the approaches to knowledge representation that Enlightenment Architectures has employed and the major challenges we have encountered when seeking to apply TEI to early modern catalogues. In Case Study 1 we present examples of how we have customised and extended TEI so that it can better represent our historically-sensitive readings of Sloane's catalogues. In Case Study 2 we discuss the difficulties that we faced when Ortolja-Baird et al: Digital Humanities in the Memory Institution 6 seeking to model object names and encode them in TEI. At stake, we argue, is not only how we can best use the methods of digital humanities to represent early modern catalogues but also the current limits of humanities and curatorial knowledge about such catalogues.

About Sloane
The globally significant collections of Sir Hans Sloane (1660-1753) were the foundations of three of the United Kingdom's national institutions: the British Museum, the Natural History Museum and the British Library. Sloane's collections of books, manuscripts, natural history, art, antiquities and ethnographic materials from around the world were a pivotal site of knowledge production and circulation during a period from the 1680s to 1750s and indeed in the British Museum after his death. Representing possibly the largest and most extensively documented of such collections, Sloane's handwritten catalogues are arguably among the first sustained attempts at collection management and information studies in the western world: as such, their intellectual legacies are unparalleled. As a royal physician, natural philosopher of apparently unlimited curiosity and both Secretary and President of the Royal Society, Sloane attempted to encompass the world and its knowledge through the creation of an encyclopaedic collection that would be left to the Nation upon his death in 1753.
Catalogues of collections were characteristic of early modern natural philosophy and went beyond a simple list or record of museum content (Findlen, 1996). In catalogues like Sloane's are the origins of modern methods for managing scholarly information (Blair, 2010). They can be conceptualised as 'human search engines' (Delbourgo, 2011). Sloane's catalogues are vital keys to unlocking not only his collection but also a greater understanding of the way knowledge was developed and produced. The differing priorities, rhetorics and documentation conventions of these catalogues provide rich information about the contribution of collecting to systems of 'rationality' that emerged in Sloane's era (Greenhill, 1992). Blakeway's research exemplifies the new knowledge that can be created through a close study of even a single subset of Sloane's catalogues (in this instance his library) (Blakeway, 2011). She demonstrates how much work was involved in the act of cataloguing, by identifying through their handwriting the multiple authors of Sloane's library catalogues over time, the numbering and shelving systems adopted, and both contemporaneous and later chronology and re-ordering of the catalogue entries. Most recently, Kusukawa Inventory and cataloguing studies have been developed by historians of collecting for well over a century. Yet this research has often been carried out within a single discipline, such as art history (Keating and Markey, 2011). Sloane's 17thand 18th-century manuscript catalogues present considerable research challenges to existing paradigms. No longer directly or consistently connected to his widely dispersed physical collections, they are far too extensive and complex to be studied without computational assistance. They also require broader disciplinary reach due to the encyclopaedic knowledge they represent (MacGregor, 1994). Sloane's catalogues and their complex, heavily annotated and indexical structures have consequently remained little understood and unanalysed; this project aims to change that. Enlightenment Architectures focuses on five of Sloane's catalogues: two volumes of 'fossils', one volume of printed books and ephemera, one of 'miscellanies', and one of his collection of manuscripts. 2 All have been transcribed and we use this sub-set as a lens through which to best understand how collections and their documentation 2 The titling of Sloane's manuscript catalogues is complex and current reference titles are not always those that were assigned by Sloane himself (see Caygill, 2012).

Ortolja-Baird et al: Digital Humanities in the Memory Institution 8
together formed a cornerstone of the laboratories of the emergent Enlightenment.
To achieve this we are bringing cataloguing, inventory and curatorial studies into conversation with digital humanities and aim to devise and implement an interdisciplinary method bundle or bricolage that can create new knowledge about how Sloane's catalogues were written, organised, annotated and used. A cornerstone of our approach is the computational modelling of the catalogues, including making cataloguing-, inventory-and curatorially-informed readings of the catalogues machine readable, in line with the Guidelines of the TEI. We will now describe this process in greater detail.

Digital Humanities approaches
Data modelling is emblematic of computing. This is because: models provide formalized perspectives on their subjects, expressed in a way that makes it possible to gather specific information about the subject. In short, the formalized model determines which aspects of the subject will be computable and in what form (Flanders and Jannidis, 2016: 229).
Modelling is accordingly a central activity of digital humanities and one of the main ways that it seeks to form and transform knowledge. In the main, it is 'modelling for' that is undertaken in digital humanities and this analytic approach aims to 'figure out how something works by taking it apart' (McCarty, 2008: 256;2014: 26-29).
Though such analytical work has a long history in the Humanities (Orlandi, 2002), the use of the computer as a partner in this process changes it substantially. When using a computer to model a catalogue such as that of Sloane, the model must be expressed within the constraints of computing technology: complete explicitness and consistency is required. In this way, computational modelling demands that humanities scholars identify and express interpretations of relevant textual features with an often-unprecedented degree of systematisation. Paradoxically, though, it has been argued that the greatest successes of modelling are to be found in its failures, or 'via negativa': ' [modelling] gives us a tool for isolating that which will not compute and thus forces the epistemological question of how it is that we know what we really know in the humanities' (McCarty, 2008: 256). This will be exemplified in case study 2 below with respect to the 'problem of the object' in Sloane's catalogues. Consequently: models of whatever kind are far less important to the digital humanities than modelling. Modelling is crucial. If you only remember a single sentence from this brief essay, remember this one: the word ' computing' is a participle-a verbal adjective that turns things into algorithmic performances (McCarty, 2008: 254-5).
Thus, the ideal role of the computer and the purpose of computing in digital humanities is not to make research better, faster and/or cheaper. On the contrary, as a number of writers have argued, computing should be about making problems more difficult, more complex, more thrilling-computing is, or can be, ' a telescope for the mind' (Masterman, 1962). Different approaches to the use of such a 'telescope' tend to be pursued in the institutions that specialise in cultural heritage and text-based humanities, or in the memory institution and the university. As a result, various studies (Eide, 2014; Ore and Eide, 2009) of how to bridge their modelling activities have been conducted: Computer based modelling in cultural heritage has focused on database development, generalised as data standards and, since the 1990s, also formal ontologies. Modelling in digital humanities has had its core in textual scholarship, including close reading and text encoding of literary and historical sources as well as models of text corpora, usually relying on statistical methods (Ciula and Eide, 2014: 35).
As stated above, we are implementing the digital humanities modelling of Sloane's catalogues largely in line with TEI. This is an authoritative set of guidelines for making Humanities texts machine readable and is endorsed by agencies such as the NEH, AHRC and the EU's Expert Advisory Group for Language engineering (TEI Consortium, n.d.). Given that the primary location of the Enlightenment Architectures project (the British Museum) and that the locations of the objects that are described in the catalogues are memory institutions, we carefully considered using a formal ontology such as CIDOC CRM rather that TEI as the basis of our work.
CIDOC CRM is a conceptual model designed to provide ' definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation' (CIDOC, n.d.).
However, we concluded that TEI would be better for maintaining the integrity of Sloane's formally unstructured and continuous handwritten text, whilst exploring its information structures and its discursive development over time. This is because the aim of our modelling is to accurately represent the information architecture of Sloane's catalogues rather than to reconcile those information architectures with the concepts set out in a 21st-century encoding languages or ontologies. In CIDOC CRM, for example, '[t]he central idea is that the notion of historical context can be abstracted as things, people and ideas meeting in space-time' (Ore and Eide, 2009: 163). This is not a view of the catalogues that we were happy to commit to at the beginning of this work. Of course, implicit conceptual models underpin TEI too, but arguably not to the same extent and they are not articulated as such: 'The TEI guidelines are focused on how to annotate texts and do not prescribe any specific conceptual model' (Ore and Eide, 2009: 165). After careful consideration, we decided to use TEI as the master format for the project and integrate our annotations with that of an appropriate ontology at a later date, as other projects have done (e.g. Ciula, Spence & Vieira 2008). Nevertheless, the process of adapting and extending TEI to encode our material has presented significant challenges.
Though internationally recognised, TEI has been criticised from various angles.
Earlier debates often centred on the theories of textuality that underpin it (e.g. DeRose et al., 1997;Renear, 1997;Renear, Mylonas & Durand 1993) and concerns of postmodern criticism, like performativity, that it poorly accommodates (Caton, 2000;McGann, 2007;2004: 193-207). The appropriateness of embedded markup for cultural heritage texts has been questioned (see Schmidt, 2010). A recurrent point of concern is the complexity of TEI (e.g. Burghart and Rehbein, 2012;Dalmau and Hawkins, 2014;Dee, 2014), and the need for more user-friendly, TEI-compatible tools. Communities like Epidoc have developed specialist subsets of the overall guidelines for the encoding of epigraphic documents (EpiDoc, n.d.). In addition to the transcription and editorial treatment of texts, EpiDoc also addresses the history and materiality of the objects on which the texts appear (i.e., manuscripts, monuments, London, n.d.) which comprises four distinct, interoperable projects (map, gazetteer, library and survey) whose databases share a common TEI tagset, thus enabling users to 'visualize, overlay, combine, and query the information in the MoEML databases' (Jenstad, 2018).
A number of projects have also sought to use TEI to interrogate historic or current catalogues. Adopted by libraries in particular, for example the Bodleian in Oxford, TEI has become a common framework for exploiting digitised catalogues.
Another project, The Digital Ark, is: a web-delivered virtual museum of collections of rarities and curiosities in England and Scotland from 1580 to 1700, comprising documentary and graphical representation of up to 10,000 specimens and artifacts collected in that period, some of them surviving in museums in England today (Nelson, 2016).
Similarly, the ASCH Project aimed to develop a metadata model to allow the contextualisation of different types of digitised resources (ASCH, n.d.). Using objects from the von Asch collection at the University of Göttingen, the project used TEI to encode the documents, such as letters and inventories, which referenced the objects, which were then linked to the metadata descriptions of the objects themselves.
However, Enlightenment Architectures fundamentally differs from these projects as it does not seek to address questions of provenance primarily, but rather of the organization of the information that is recorded in Sloane's catalogues. In the next sections we will discuss the difficulties that we have encountered when trying to

Case Study 1: applying and extending TEI
Upon his death in 1753, Sloane's library was estimated to contain some 50,000 volumes, over 400 of which were books and albums of prints and drawings and 2,666 were volumes of manuscripts-the rest were printed books (see .
MS. 3972 C vol. VI is one of eight original volumes that contain the catalogue of Sloane's books and printed ephemera, now held at the British Library. Comprising 530 folio pages, in a variety of Sloane's and amanuenses' hands, it captures some of the richness of his library. It comprises catalogue entries for monographs, atlases and bound volumes of printed materials such as dissertations, treatises, proposals, letters, accounts and ephemera.
The majority of the catalogue's pages follow the entry layout in Figure 1. In the left-hand margin is the alphanumeric catalogue number, often crossed out, sometimes more than once, and replaced with a new number. To the right of this is the catalogue entry, which contains purely bibliographic detail: author, title Early modern handwritten and printed library catalogues have been extensively studied by historians of science, the book and bibliography (see Walsby and Constantinidou, 2013). More importantly for this project, the general history of 17thand 18th-century private library practices, book collections and the collecting habits of Sloane's contemporaries have been richly documented (Loveman, 2015;Edgington, 2016;Poole, 2015 This tag serves to group all the information, be it descriptive, graphical or spatial, that corresponds to each catalogue number. This includes such varied information as the description of the object listed-its size, shape or condition; its provenance (person and place); the price paid for it; and many other details. Importantly, it can also contain the descriptions of multiple objects, all of which have been purposefully documented by Sloane and his amanuenses under one catalogue number. Although individual elements such as <place> (as in a geographic location) or <ref> (a reference to a location of any kind) are also tagged within the catalogue entry, we group these elements together within the <ea:catent> in order to convey the original cataloguer's choice to include this particular information when describing the object at hand. The information recorded (and not recorded) in the catalogue entry reflects how the individual perceived the object and the knowledge that they had about it.
Various possibilities exist for encoding this information in line with TEI. For example, we considered using a generic <div> element (a '(text division) [that] contains a subdivision of the front, body, or back of a text' (TEI Consortium, 2018b)) with an attribute to specify which kind of division is being referred to, for example, <div type="catEnt">. Yet we rejected this for a number of reasons.
Firstly, the information that is supplied in the type attribute is crucial: first order information rather than qualifying information. Though the question about when to use attributes versus elements is one that is contested by XML experts (Cover, 2008) there is some consensus that, where possible, first order information should be recorded as an element (e.g. w3schools, n.d.). This verdict is also linked to the semantic limitations of XML, where relationships can be deduced from the nesting of elements but not from the order of attributes (Antoniou and van Harmelen, 2008: 32). So too, attributes can be more difficult to process than elements. As we do not see the encoding of the catalogues as an end in itself, but rather as something that can support the further interrogation of the catalogues, we concluded that it was therefore appropriate to devise a specialist element to encode this data.
Another option could have been to adapt elements currently found in the TEI header to apply to the content of the manuscript catalogue. For example, the TEI guidelines provide <msContents> (' describes the intellectual content of a manuscript or manuscript part, either as a series of paragraphs or as a series of structured manuscript items' (TEI Consortium, 2018d)) and <msItem> (' describes an individual work or item within the intellectual content of a manuscript or manuscript part' (TEI Consortium, 2018e)). However, not only are the decisions of the original cataloguer lost through the use of these <ms> elements, which do not strictly include the entire content of the catalogue entry, but crucially, the semantic import of an 'individual work or item' would not produce the necessary dataset for understanding the structure of the catalogue and its entries. The <ea:catent> element is thus a vital innovation for those who are not only seeking to extract data from catalogues but also attempting to understand the internal structure of the catalogue itself.
Related to <ea:catent> is the <ea:catnum> or ' catalogue number' element that we have also created. This element serves to identify each catalogue number listed in the catalogues. The TEI guidelines suggest that <idno> ('supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way' (TEI Consortium, 2018c)) or <altIdentifier> (' contains an alternative or former structured identifier used for a manuscript, such as a former catalogue number' (TEI Consortium, 2018a)) would suffice for this purpose. Figure 2 demonstrates how another catalogue could be embedded with the <idno> and <altIdentifer> tags.
The same logic could be applied to Sloane's catalogues. However, without any certainty as to whether the objects he listed have a new <idno> since being dispersed from his collection, the original catalogue number is the only identifier available, even though it is theoretically an <altIdentifier>. Curators at the British Museum who can match an object with a Sloane catalogue entry give it a Sloane registration number, which includes this catalogue number, which becomes its primary unique identifier. By contrast, Sloane's printed books at the British Library have been given a new shelfmark, which has resulted in the effective 'loss' of his collection 'in plain sight' within the library itself (Walker, 2016). Thus, in order to demonstrate the importance of these catalogue numbers as not just one identifier among several, but rather the only record which connects these objects to both the catalogue and also one another, the element <ea:catnum> has been created in order to underscore the significance of these catalogue numbers. This is particularly important as it is speculated that these numbers themselves can potentially aid our understanding of how Sloane acquired materials chronologically, and catalogued and grouped his items, and perhaps even how they were physically arranged in his house, both visually and for access and use, thereby allowing us to consider how Sloane and his contemporaries ordered and understood the world around them.
In other instances, we have made changes to TEI that have a more limited scope.
Catalogue 3972C vol.VI, for instance, contains the element <textName>, which builds upon <msName> (' contains any form of unstructured alternative name used for a manuscript, such as an ' ocellus nominum', or nickname' (TEI Consortium, 2018f)) to indicate those printed texts listed in the catalogue under a title which cannot be found by the same name elsewhere. As with <ea:catnum>, in these instances the <textName> is the only identifier that exists. We therefore wish to be able to identify that these are published, if currently untraceable with the titles that Sloane gave them, even if we have no current record for them. 3

The markup of MS 3972C vol. VI
The TEI markup of MS 3972C vol. VI captures the most important bibliographic content of the catalogue entries, as well as key physical and graphical elements (see Figures 3 and 4). Table 1 shows the elements that are included in the markup (generic TEI structural elements such as <p> and <lb> are not listed here).
This markup effectively captures the core details of the catalogue in line with our historically-informed readings of them. At present, though, we are not encoding the particular language in which a title or other information is given; 4 3 Some of these instances might well be published items which have been incorrectly recorded by the original cataloguer. The potential reasons as to why a text was recorded in an unstandardised way raise bibliographic questions regarding the editions, pirated copies, printed commentaries on texts, short titles and other sources which may have been available at Sloane's time and these original entries may contribute to our understanding of contemporary book cataloguing practices, the availability of texts during Sloane's lifetime and possibly even the book trade more generally. 4 While most of Sloane's catalogues are predominately written in English and Latin, the many languages found in 3972C vol. 6 reflect the general composition of his library, a quarter of which only, it is speculated, comprised English language books. neither time nor resources allow this. There is one exception to the decision to disregard foreign languages, which is the tagging of multi-language place names.
The network of places from and through which Sloane's collection reached him stands to be a profitable line of enquiry and for this reason the names of publication places in different languages, such as London/Londra/Londres/Londinium, will be linked to one single georeferenced location. While this will not reflect the language composition of Sloane's library more broadly, it does ensure that this crucial information which links Sloane to the wider world around him is made identifiable and analysable.   objects.
The natural history catalogues of Sloane and his contemporaries have been described as interpretive 'repositories of multiple intersecting stories that textualized and contextualized each object' (Findlen, 1996: 36, note 61). The crux of these stories was the lengthy object descriptions given in each entry. They appear to have been part of a method of 'verbal description' (Wragge-Morley, 2010) that was used by Sloane and others to make the ever-expanding world knowable (not just by possessing the object itself, but by creating and retaining written information about its source and use). Early modern techniques for understanding the natural world included close observation of differences between specimens and the rendering of these differences in text form. Producing or reading these descriptions had a particular cognitive value that was central to understanding what was being described, especially for those without access to a collection. As Descartes enthused, the purpose of a worthy description is to cause a sensory impression and create correspondent images in the imagination: something that can be effectively done by words alone (Wragge-Morley, 2010). This is implicit in the writings of Nehemiah Grew, a contemporary of Sloane As discussed above, Sloane's catalogue entries consist of a catalogue number and an object description along with various annotations. TEI markup has been used by Enlightenment Architectures to encode a wide range of this information. For example, the standard element <name> is used to identify names and to distinguish them from additional information about a person. The element <addname> allows for references to nicknames and aliases, and is particularly useful for variant spellings of names. In addition, TEI offers various options for encoding the provenance of an object such as <placeName>, which identifies an absolute place name, and <geogName> for the identification of more specific geographical features. Similarly, the element <date> allows a date (in any form) to be tagged, which is useful for establishing the timelines of Sloane's catalogues. Additional information such as pencil location codes, monetary values, brackets, drawings and much later curators' comments can also be marked-up. For example, <add rend="pencil"> and <add rend="red"> encode ' additional' comments appearing in pencil and red ink.
Capturing these (and what are thought to be location codes) in the margins of the catalogues is crucial to understanding Sloane's methods of arranging objects in his own home whether by theme, use, material or size, for example (see Caygill, 2012).
At an early stage of the project we identified the benefits that would result from encoding the objects that are described in Sloane's catalogues. The encoding of an object name in historical sources enables both humans and machines to identify and manipulate such names and, for example, to identify patterns in their descriptions and expand the potential for reuniting objects in the memory institution. Indeed, identifying a string of text that can function as a verbal signifier of an object is central to treating a historical document like a manuscript catalogue with ' curatorial sensibility' (Nelson, 2016). As we will show in this case study though, what has proved most difficult about attempting to encode object descriptions is not the application of TEI but a more fundamental issue-namely the problem of how to consistently and reliably identify references to individual objects and to identify the boundaries that exist between ' object names' and their qualifying descriptions. We asked ourselves, might it be better to side-step the issue of identifying the boundaries of the object and instead encode only the catalogue number, which keys the entry back to the physical object? Take, for example, 'Red corall growing on a rock wt.
shells'(Catalogue of 'Coralls, Sponges, & some other submarines', n.d.: Entry no. 11, f. 2). In this example, the boundary between the object that is being described and additional descriptive information about that object is difficult to identify. Which, if any of the following suggestions specifies the object name?
corall Red corall Red corall growing on a rock Red corall growing on a rock with shells This problem has also been discussed by Nelson who has argued: We must be able to determine and define the limits of what constitutes a mention of an object. This is crucial so that we retain all information that is immediately relevant to that object, but (ideally) no more than is pertinent. … We must be able to define the relevant and relative contexts of each mention of an object: … To enable an articulation of an object's place in a hierarchy; … To enable identification of relevant and related information; and … To enable the articulation of events involving other entities (e.g., people, places, and other objects) (2016).
In his discussion of the difficulties of identifying the boundaries of object names and their descriptions, Nelson (2016)  We asked them to identify the ' object' in a longer description and to indicate which words should be marked-up as the object. These entries were chosen because they highlight the complexities of trying to disambiguate and encode an object in Sloane's catalogues.
Respondent 1, a digital humanities expert, saw the solution to tagging the object as relatively simple: ' each of these descriptions is grammatically a noun phrase, and as a general rule I think the head noun of the phrase is as good a candidate for identifying the object [that Hans Sloane] (or his assistants) saw as the object being described'. The head noun in each example then, is the one on which everything else in the noun phrase grammatically depends and this would change depending on how the description is phrased. In example 1, ' corall' is the object and in example 2, ' claw' is the object. In those instances where the head noun denotes a measure, as with example 4, 'the head noun and the prepositional phrase identifying the whole from which the part is taken is referred to as the object'. Even with this rationale it remains challenging to be consistent in the identification of the object name. With regard to example 4, we remain unsure whether it is the 'keel' or the 'piece' that should be tagged.
However, the historians of science, collecting and natural history whom we consulted reiterated an earlier argument, one of context. They consider grammar alone as insufficient for identifying an object. As Respondent 2, a historian of science, argued, 'if we want to be as "historically sensitive" as possible, then for Sloane, the entire entry is the object'. This means that all of the words that detail colour, material and size were chosen specifically by Sloane to describe the details of the object.
Their order and place within the catalogue had meaning for Sloane. Respondent 3, a historian and philosopher of science, likewise argued that 'in all four cases it is the whole phrase that designates the object'. This is because the phrases are descriptive: they describe the objects but do not contain names in a technical sense that could be considered as labelling or indexing the object in question. What we find instead, are generic nouns like ' corall'.
But no analysis of what the object is can stop here, at the juncture between marking up one head noun or marking up the entire description. Indeed, what becomes clear is the difference in current interpretive analysis depends on domain knowledge but also on the potential end use of the data. Take for example the topic of species and taxonomy. Respondent 3 argued that, if the object 'is the kind or species that the specimen is supposed to represent, [then the] "red coral" is the object, because corals were often distinguished by their colour'. Likewise, Respondent 4, a botanist, noted the importance of capturing 'red coral' for the purpose of indexing: ' an index entry might read "corall, red" because of the need to alphabetise the index entries'. Here the interpretive analysis takes into consideration the object as well as its features, such as colour, or its ecology.
If the information found in these catalogues is to form part of an institution's

Conclusion
Collections documentation has often been described as making the difference between a museum and a junk shop. Catalogues are the core documents of museum structure and meaning, yet no significant computational analysis has been made to date of how catalogues from the early modern period are constructed or of how their structure and content relate either to the world from which collections are assembled or to the museums they form. Enlightenment Architectures is undertaking this task on some of the oldest, most detailed and most significant museum catalogues in the English-speaking world. The interdisciplinary approach that we are devising to pursue this in the context of Sloane has the potential to enable us to model the information structures of Sloane's catalogues computationally and interrogate them in ways that would otherwise be impossible.
Our research, and the new directions opened by it, will profit the communities of researchers, curators and information professionals who are addressed in this article.
We expect that historians and curators will benefit from the ability to digitally search the catalogues in ways that would be impossible using paper catalogues alone, such as being able to search according to the colours, materials, weights and sizes that are mentioned in respective entries. It will also be possible for researchers to download the TEI-encoded versions of Sloane's catalogues and to extract, map and visualise information that is included in them. The TEI extensions and customizations that we have proposed can also be taken up by other projects working on early modern archival materials. Finally, the more fundamental questions that we have raised about digital approaches to the modelling of early modern information may open new conversations between curators and digital humanists about current approaches to cataloguing and placing collections online.
It is widely understood that early modern catalogues such as those of Sloane are 'not simply lists that can be taken at face value' (Keating and Markey, 2011: 211).
Rather, they are ' authored documents compiled under particular temporal, legal, political, and social constraints that affected their organization and the ways in which the objects they list were described' (Keating and Markey, 2011: 211) As such, they are sites where narratives of power and knowledge are made, unmade, silenced and sometimes imagined anew. This is no less true of the digital models of Sloane's catalogues on which the Enlightenment Architectures project is at work. The object problem that we have discussed above points to parallels between early modern and current classification systems. We still struggle, despite our "sophisticated" systems of classification, to determine what things are, where they belong and how to classify them. While we don't necessarily understand Sloane's cataloguing epistemology fully, we share his struggle regarding how best to use words to describe objects. The digital representations that Enlightenment Architectures is creating will contribute to what Poole described as the 'rich, complex and interwoven cultural experience on the World Wide Web' (2016). Therefore, it is crucial to interrogate the potential and limits of encoding languages like TEI for representing early modern catalogue materials, as we have done in this article.
Bowker has written of the totalizing imperatives of data-driven fields such as biodiversity and of how such efforts to homogenise and standardise data can make it incompatible with the user-generated datasets organised by 'local data cultures' (2000). We understand the work that we have undertaken in the Enlightenment Architectures project, and digitally-mediated research on the early modern period more widely, as indicative of a 'local data culture' and this article has demonstrated the nuance this research can contribute to a digital humanities that is sometimes portrayed as totalising and elitist (e.g. Grusin, 2014;Pannapacker, 2013). Of the black-boxing effect of technology more generally, Latour has written of: the way scientific and technical work is made invisible by its own success.
When a machine runs efficiently, when a matter of fact is settled, one need focus only on its inputs and outputs and not on its internal complexity. Thus, paradoxically, the more science and technology succeed, the more opaque and obscure they become (1999: 304).
In this article, we have pushed against the black box to reveal some of the 'internal complexity' of the Enlightenment Architectures project. We have shown the importance of attention to 'internal complexity' when thinking about the potential of interdisciplinary research across the digital humanities, history of knowledge, the library and the museum, especially in terms of the digital collections that such work can give rise to.

Ethics and consent
The aspects of this research that have involved human subjects have been carried out in accordance with the Declaration of Helsinki. Our work was given ethical clearance by the research director of the British Museum, in line with institutional norms. Informed consent was sought from participants and their identities have been anonymised.