Digital Corpora and Scholarly Editions of Latin Texts: Features and Requirements of Textual Criticism

Digital philology has produced a wide range of new methods and formats for editing and analyzing medieval texts. The provision of digital facsimiles has put the manuscripts, the very material base of any editorial endeavor, into focus again. Several editions have been created that engage primarily with individual manuscripts; others have posited a wide range of variance as a central characteristic of medieval literature instead of relegating variants to the footnotes of ahistorically normalized and regularized texts or speculative reconstructions of archetypes and authorities. Nevertheless, the idea of a critical text, especially of nonvernacularmedieval works, does not yet seem to be obsolete. Quite the opposite: the number of digital facsimiles of manuscripts and early print books and the quantity of document-oriented transcriptions available online is growing continually, and with it the need for critically examined and edited texts increases. Like a medieval reader having little choice but to rely on the only manuscript copy available at her or his library, without a critical text the modern reader is at a loss to adjudicate on the quality of the textual version picked up randomly on the internet. Moreover, digital technologies, methods, and standards have steadily improved, creating possibilities for digital critical editions the quality of which former generations of editors could only imagine. As of yet only a relatively small number of born-digital critical editions of Greek and Latin texts exists.


Introduction
Digital philology has produced a wide range of new methods and formats for editing and analyzing medieval texts. The provision of digital facsimiles has put the manuscripts, the very material base of any editorial endeavor, into focus again. Several editions have been created that engage primarily with individual manuscripts; others have posited a wide range of variance as a central characteristic of medieval literature instead of relegating variants to the footnotes of ahistorically normalized and regularized texts or speculative reconstructions of archetypes and authorities. 1 Nevertheless, the idea of a critical text, especially of nonvernacular medieval works, does not yet seem to be obsolete. Quite the opposite: the number of digital facsimiles of manuscripts and early print books and the quantity of document-oriented transcriptions available online is growing continually, and with it the need for critically examined and edited texts increases. 2 Like a medieval reader having little choice but to rely on the only manuscript copy available at her or his library, without a critical text the modern reader is at a loss to adjudicate on the quality of the textual version picked up randomly on the internet. Moreover, digital technologies, methods, and standards have steadily improved, creating possibilities for digital critical editions the quality of which former generations of editors could only imagine. As of yet only a relatively small number of born-digital critical editions of Greek and Latin texts exists. 3 Even so, the (albeit slowly) growing number of digital critical editions increases the demand for assembling and providing critical texts that are in the form of a textual corpus, because only collections or corpora of texts that are otherwise dispersed on various websites allow for a systematic analysis and for efficient research across the works of a specific author, genre, subject, period, or language as a whole. 4 In this article, some features and requirements for a digital corpus of critical texts are proposed and discussed in order to realize the heuristic, explorative, and interpretative potential of integrated historical texts from the classicist and postclassicist tradition of Greek and Latin works. Generally speaking, when corpora of classical or medieval Latin or Greek texts are compiled and published, they are stripped of their critical features, namely the accompanying introduction, commentary, and apparatus notes. One reason for this omission might be economic: if the texts are published by a traditional publishing house (such as Brepols, with its Library of Latin Texts 5 ), the digital text versions of the corpus are considered an additional means of entry to the printed version in order to give access to a large variety of texts and promote the canonical print products, which remain indispensable for accurate citation and reference.
If the texts are published by an academic institution not primarily driven by economic interests (such as, most notably, the Perseus Digital Library 6 or the Digital Library of Late-Antique Latin Texts 7 ), the reason for skipping the critical features of a printed scholarly edition might be more practical in nature. While it is rather easy to digitize plain texts, it is very hard to encode the complex and often idiosyncratic reference system of apparatus notes (lines, lemmata, variant readings, sigla, etc.). This task requires both a lot of time and a high degree of skill on the part of the digitizing person. 8 4 On the general aspects and purposes of digital corpora see the catalog of "Criteria for Reviewing Digital Text Collections," by Ulrike Henny and Frederike Neuber in collaboration with the members of the Institut für Dokumentologie und Editorik (IDE), version 1.0, February 2017, http://www.i-d-e.de /publikationen/weitereschriften/criteria-text-collections-version-1-0/: "A few examples for collection design principles are completeness (e.g. if the corpus aims to represent the work of an author as a whole), representativeness (if the corpus claims to be representative for a specific subject domain and functions as a reference for that domain) and balance (e.g. if the corpus is built to allow for contrastive analyses between its components such as different text genres or regional language varieties)." 5 Library of Latin There are other causes for the omission of text-critical features, such as copyright issues 9 or a predominant interest in simple text analytics and computational methods, such as stylometry, topic modeling, computational semantics, text mining, or search and retrieval applied to plain text versions. 10 Be that as it may, one might ask whether it would be sufficient simply to add the information as given in the apparatus criticus and in the philological introduction to make these texts "truly" digital critical editions. A "truly" and fully fledged digital scholarly edition is surely something more than, or at least something different from, a traditional scholarly edition in a digital format. 11 But if that is the case, how does this fit into a corpus of digital scholarly editions?

Digital Critical Editions: Six Case Studies
In the following analysis, six editions will be presented. They are all critical and digital editions of Latin or Greek works. They have been or are being created in connection with my personal and institutional involvement under very specific conditions, at a certain place and time, with very specific aims and scope. They serve here as case studies to identify some general characteristics of digital critical editions. On the basis of these examples, four proposals will be made for how to create a digital corpus of critical editions.

First Study: Historians from Late Antiquity
The collection and edition of fragments and testimonies of historians from late antiquity is a long-term project carried out at the University of Düsseldorf. It has been conceived as a traditional critical print edition with a parallel online presence. The edition comprises a critical text furnished with an apparatus criticus and a philological introduction. A commentary, German translation, and bibliography are planned to be published exclusively in print-as a concession to the business model of the publisher. The online version is being realized by the Cologne Center for eHumanities (CCeH) of the University of Cologne. The critical texts are edited 9 The copyright status of edited ancient or medieval texts varies according to national legislation. For instance, under German law, a critical text of an edition (created by an author deceased centuries ago) might not be copyrighted, while the introduction, commentary, and apparatus are. Otherwise there is legal uncertainty, and uniform international guidelines or legal assistance are missing. with Classical Text Editor (CTE), 12 a software tool widely used by traditional philologists for creating multiple apparatus in printable format, namely PDF. The tool also provides an HTML and even TEI-XML output, marking up all relevant layout information of the print version: sections, fonts, italics, borders, spaces, and so on. Semantic information (such as readings, witnesses, lemmata, quotes, sigla, and references) is not marked up explicitly. As a consequence, the digital version is a mere reproduction of the print, lacking any additional features except for basic browse and search. For this reason, it can be labeled a critical edition, as it provides a philological introduction and critical annotations (even if based on the work of previous editors), descriptive information, and indices, as well as-after a so-called moving wall, that is, after a certain period of time-commentary and translation. In essence, the edition follows the print paradigm. Digital methods or functionalities have not been applied. Its usability does not significantly differ from the usability of a printed book. Even if critically annotated and digitally presented, from a technological perspective the established texts are plain and single-dimensional (Fig. 1). 13 Second Study: Saint Patrick's "Confessio" The digital edition of Saint Patrick's Confessio, a fifth-century open letter by Ireland's patron saint, is based on a critical print edition from 1950 including critical apparatus, apparatus fontium, apparatus biblicus, and commentary, but also adding various text layers (facsimiles, translations) and features (paratexts, bibliography, scholarly articles, fiction, and more)-all of which are closely interlinked and furnished with user-friendly functionalities (hyperlinks from sigla to facsimile, from lemma to text, from reference to bibliography, and so on). 14 The realization of the edition entailed a wide range of tasks and actions: OCR cleanup; the acquisition of facsimiles; copyright negotiations; encoding of the canonical work structure and alignment with the structure of manuscript witnesses, prints, and translations; and, last but not least, a detailed encoding of the apparatus entries and the editor's commentary. The presentation of various textual layers, versions, and annotations relies heavily on the application of hypertext technology and is suitably labeled a hypertext stack edition (Fig. 2). 15

Third Study: Guillelmus Autissiodorensis
The digital editio princeps of William of Auxerre's treatise on liturgy, the Summa de officiis ecclesiasticis, 16 has been generated from a detailed transcription of the prin- 15 A comparable edition (if on a slightly smaller scale) is the edition of the Schedula diversarum artium (http://schedula.uni-koeln.de/), providing all relevant texts and documents to assess and analyze the complex stages of editorial revision and textual transmission. In the form of a digital collection of three critical print editions, that edition might even be labeled a metaedition. 16 Magistri Guillelmi Autissiodorensis Summa de officiis ecclesiasticis, ed. Franz Fischer (Cologne, 2007-12); online: http://guillelmus.uni-koeln.de; Franz Fischer, "The Pluralistic Approach-The First S268 Digital Corpora and Scholarly Editions of Latin Texts cipal manuscript witness, includes variant readings from a selection of other witnesses, and is enriched with critical editorial markup. Published in 2007, it is the first of its kind in medieval Latin philology, as it follows a pluralistic textual paradigm and provides a critical text with a threefold apparatus, links to all facsimiles on the page level, extensive descriptions of the manuscripts, a detailed transcript of the principal manuscript witness, a reading text of an almost-contemporary revision of the text, an introduction, indices, and so forth. Applying a digital methodology and addressing a wide range of notions of text, this edition might be labeled a born-digital, multi-dimensional, or pluralistic scholarly edition (Fig. 3).

Fourth Study: Carolingian Capitularies
The Capitularia project provides transcriptions of important law texts from the Carolingian era: collections of decrees of Frankish rulers regulating political, mil- itary, ecclesiastical, social, economic, and cultural matters, usually drawn up and issued during the course of royal assemblies and distributed by so-called missi, counts and bishops. Previous critical editions published in print all failed to reflect adequately the diversity and complexity of the textual transmission. In a new editorial approach, all manuscript witnesses are being transcribed with a focus on structural information, such as rubrics, initials, and the order of chapters and capitularies. This serves the twofold aim of respecting the individual and regional characteristics of each of these historical documents and enabling a semiautomated comparison for detecting and highlighting differences and commonalities among the witnesses (Fig. 4).
These automated collations, made using the collation tool CollateX, 17 constitute the basis for a critical assessment of the textual tradition and for establishing a critical text version to be published both in print and online as part of the Monumenta Germaniae Historica (MGH and dMGH, respectively). 18 Aiming to document both the full textual transmission and a critical text and following a twofold publication strategy, this edition might be labeled a multiwitness hybrid edition (Fig. 5).

Fifth Study: Monasterium.net
Monasterium.net is a collaborative and virtual digital archive, presently providing access to facsimiles and descriptions of more than six hundred thousand me- dieval and early modern charters from more than one hundred and fifty archives. The online platform allows for digital editing of the charters at all scholarly levels: in some instances, scans are provided, along with the most basic metadata, such as repository and shelf marks; in others, short descriptions and abstracts are included and, if available, retrodigitized print editions; whereas in others, veritable borndigital diplomatic editions are produced that include introductions or prefaces, diplomatic transcripts encoded according to the standard of the Charters Encoding Initiative (CEI), a diplomatic analysis, and bibliographies. Since charters usually survive as single documents, there is no critical annotation in the form of critical apparatus entries. The nature of these charter editions varies and ranges from digital diplomatic editions in their original sense, that is, focusing on dating, proof of authenticity, and the analysis of the content structure of a charter; 19 digital documentary editions, focusing on external features of the documents; and data-enriched editions, with information on historical persons, places, events, or decoration (for example, in the art historical subcollection of illuminated charters) 20 (Fig. 6).

Sixth Study: Digital Averroes Research Environment (DARE)
The Digital Averroes Research Environment (DARE) collects and edits the works of the Andalusian philosopher Averroes (Abū l-Walīd Muh Á ammad Ibn Ah Á mad Ibn Rušd), born in Cordoba in 1126, died in Marrakesh in 1198. Through the portal, images of as many textual witnesses as possible, that is, manuscripts, incunabula,

S273
Speculum 92/S1 (October 2017) and early printed editions, are provided online. 21 At present, DARE includes only a small number of edited texts, most of these textual versions that have not yet been critically annotated. However, the portal is already a key resource for a long-term editorial project to create critical editions of the works of Averroes that reflects and analyzes their extremely complex transmission back and forth through Latin, Greek, Arabic, and Hebrew-an enterprise that would have been considered impossible without digital methods and resources. The established critical-text versions will eventually be integrated into the DARE platform in order to complement a digital resource that can be labeled a knowledge site (Fig. 7). 22

Variety of Editions versus Homogeneity of a Corpus
We have just presented six examples of critical approaches towards (mostly) Latin texts in a digital editorial format. They show a great variety with respect to the content and the notion of what the text is and what the respective edition actually should do. Some digital editions (1) provide a critical text following the Lachmannian paradigm, reconstructing some archetypal text version by following a strict methodology of recensio (transcription, collation, establishment of a stemma codicum), selectio, and emendatio. 23 Others (2) abide by the Leithandschrift principle and follow a principal manuscript witness. Accurate transcriptions (3) might focus on very different details and characteristics before being enriched with critical annotations. Nowadays most digital editions provide digital facsimiles of manuscripts and prints, all of which may vary in the quality of the digital scans and in the degree to which they are integrated into and interlinked with the critical text. Some editions are multidimensional, providing various versions or layers of text, parallel texts, and translations. All digital editions are labeled according to the material and the editorial method applied: critical, diplomatic, semidiplomatic, documentary, multiwitness, archive edition, and so on. Moreover, even editions with similar labels feature various differing functionalities and presentational modes, all of which are based on a large variety of encoding, since even within the de facto standard for text encoding, as provided by the guidelines of the Text Encoding Initiative (TEI), 24 there are various ways of modeling textual variance. More generally speaking, digital scholarly editions all differ with respect to the application and degree of both textual criticism and digitality (that is, the degree to which they employ and integrate digital technologies).
But if textual, or rather editorial, plurality seems to be one of the main characteristics of digital editions, how is a coherent digital corpus of scholarly editions to be constructed? How does such diversity fit into a corpus if the usefulness of a corpus is based largely on the homogeneity and representativeness of the texts that it includes? These texts are expected to be homogenous in order to be detectable, 21 For an overview of texts available, see http://dare.uni-koeln.de/?qpnode/32. 22

Digital Corpora and Scholarly Editions of Latin Texts
comparable, and analyzable across the whole corpus. Texts that are part of a corpus are supposed to be representative for a specific work, genre, or period. Having a variety of versions or textual layers of one specific work is clearly not what suits the idea of a corpus of texts. Even if it were possible to integrate complex digital resources into one portal, the amount of work and expertise needed to maintain a resource of such exponentially increased complexity would seem impracticable, given the pace of ongoing technological and methodological innovations.

Digital Corpora and Scholarly Editions of Latin Texts
Speculum 92/S1 (October 2017)

Four Proposals to Achieve a Compromise
In the following four proposals we shall explore how the two conflicting concepts and practices of idiosyncratic digital critical editing on the one hand and creating a homogeneous textual corpus on the other can be reconciled despite the apparent contradictions.
First Proposal: Digital in a Wide Sense, Critical in a Narrow Sense The first proposal to resolve the conflict between variety of editions and homogeneity within a corpus is to create and provide editions that are both digital in a wider sense and scholarly in a narrow sense. This proposal can be divided into two strategic approaches: the first approach starts from the definition of "digital," the second from the definition of "critical." 1. Digital in a Wide Sense As part of a digital corpus, each individual scholarly edition does not necessarily need to be digital in a strict sense. What does "digital edition in a strict sense" mean? According to the "Catalogue of Criteria for Reviewing Scholarly Digital Editions" as issued by the Institute for Documentology and Scholarly Editing (IDE), a scholarly edition is "an information resource which offers a critical representation of (normally) historical documents or texts. Scholarly digital editions are not merely publications in digital form; rather, they are information systems which follow a methodology determined by a digital paradigm, just as traditional print editions follow a methodology determined by the paradigms of print culture. Given this narrow understanding of SDEs, many digital resources cannot be considered digital editions in this strict sense." 25 And in an even more apodictic manner, in his most recent article on the subject, Sahle states what can be regarded as common sense among today's digital humanities scholars: • "A digitized edition is not a digital edition." • "A digital edition cannot be given in print without a significant loss of content and functionality." • "A digital edition is guided by a digital paradigm in its theory, method, and practice. 26 " Given these definitions, the point here is exactly the opposite: individual critical editions as part of a corpus need not strictly follow a digital paradigm, which, although desirable, is not a requirement. As demonstrated above, textual plurality and the complexity of the editorial approach towards an edited work is a main characteristic of a fully fledged digital scholarly edition. In contrast, the purpose of a corpus lies in its capacity to provide a large number of homogeneously edited texts, not only to ensure a high degree of usability but also to guarantee its feasibility and long-term maintainability. Therefore in principle these editions can be digitized critical editions. Content and functionalities do not have to significantly exceed the content and functionalities of the print edition, that is, on the level of the individual text as part of a corpus, even though, even here, a certain minimum of requirements should be met (see below). However, additional digital value does need to be realized on the level of the entire corpus. What additional digital value across the entire corpus can mean will be discussed under proposal 4 below.

Critical in a Narrow Sense-Four Manifestations of Textual Criticism
The other half of the first proposal needs to be clarified: create and provide editions that are scholarly in a narrow sense. The term "critical" (even though often used as a synonym for "scholarly") qualifies the meaning of scholarly, but what precisely does critical mean?
Peter Robinson, with his notorious six essential aspects of electronic digital editions, refers with the first three criteria to an essential philological methodology and scholarly rigor. 27 According to Robinson, a digital critical edition is anchored in a historical analysis of the materials; presents hypotheses about creation and change; and supplies a record and classification of difference over time, in many dimensions and in appropriate detail. These points are widely accepted by most scholars. This definition and others brought forward by renowned scholars are supported by the wide range of digital scholarly editions currently seen. 28 Be this as it may, and whatever the material, methodology, or requirements of a community, in order to make critical editions fit into a digital corpus of homogeneous texts representing works of Latin literature, the various aspects of textual criticism can be broken down into four basic manifestations of criticism: (1) critical annotation, (2) markup, (3) metadata, and (4) documentation. These essential features of a critical text must be accommodated by any model of a digital corpus, a model defining indispensable requisites and requirements for a text to be incorporated into the corpus.
(1) The first manifestation of textual criticism is critical annotation to the text, more specifically, the presence of an apparatus criticus or other means of recording textual variants and all justifications for the state of the edited text. In addition, critical annotation might include an apparatus fontium, giving references to sources and paratexts; an apparatus biblicus, as a typical feature of patristic or medieval texts; a commentary with explanatory notes or historical and philological notes, and discursive notes with present-day relevance, such as references to gender issues and sociopolitical subject matter.
(2) The second manifestation comprises the potentially very deep and extensive markup of the text: structural markup (including identifiers); markup of internal and external references or named entities; linguistic and semantic markup, such as part-of-speech tagging; lemmatization or syntactical markup; markup of typical 27 The fourth criterion mentions the presentation of an "edited" text (only) as an option; the fifth and sixth criteria refer to digital usability: see Peter Robinson, "What Is an Electronic Critical Edition?,"

Digital Corpora and Scholarly Editions of Latin Texts
features of an apparatus entry, such as sigla, references, or quotes and readings. It might also include markup of the types of apparatus entries according to categories 29 such as textual, 30 intertextual, 31 exegetical, rhetorical, 32 and metrical. 33 (3) The third manifestation of textual criticism comprises all kinds of metadata and structured information on the author, the work, and the edition itself, 34 that is, bibliographical information concerning the work itself, including its genre, dates, appropriate keywords, and so forth; as well as imaging parameters, responsibilities, licenses, and so on in regard to the edition; and contextual information in the form of a "critical bibliography." Ideally, all this information is given in a standardized format (such as TEI, METS, Dublin Core, or some other bibliographic standard) with references to authority files (such as GND, VIAF, Getty Thesaurus) for named entities and using taxonomies and ontologies (SKOS, CIDOC CRM) that are relevant for the respective field of research.
(4) The fourth manifestation comprises information traditionally provided in a philological introduction, paratexts, and other kinds of accompanying texts and materials, which can all be subsumed under the term "documentation." Ideally, the material basis of the edited text is documented by digital facsimiles of manuscript witnesses and relevant printed editions. These surrogates should be the result of what has been labeled "critical digitization" in the sense that information is provided about the decisions involved in setting up the parameters for digitizing. 35 The manuscripts should then be described thoroughly according to scholarly practice. Where transcriptions have been created, these should be included as well as the source code of all manuscript descriptions, transcripts, and the critical text itself. Moreover, it is essential to present a historical analysis, hypotheses about the creation of the text, and a record and classification of differences over time. 36 Most importantly, however, the editorial principles need to be made explicit. Again, the viability and success of a digital corpus of critical texts depends on finding an appropriate and functional overarching data model that is able to accommodate these forms of critical annotation and information. To this end, it may be useful to reduce the force of the term "critical" to a rather prosaic meaning and to define an absolute minimum of requirements for the incorporation of a critical text into a digital corpus. Referring to the four manifestations of textual criticism described above, this minimum of requirements could be: (Ad 1) The critically constituted text bears all critical information (for example, in the traditional annotation format of an apparatus) required to justify the linguistic or philological form of the edited text.
(Ad 2) The work structure is clearly defined: entities such as book, chapter, paragraph, and so on are marked up accordingly in order to fit in with a corpus-wide schema for addresses and the citation of the respective text entities.
(Ad 3) Metadata is provided on the author, work, and the edition itself.
(Ad 4) The text has sufficient material documentation (manuscript descriptions and facsimiles) and a philological introduction specifying the editorial principles.
Defining the texts that are to be included into the corpus as "digital in the wider sense" (that is, not necessarily following a digital paradigm) and as "critical in a narrow sense" (fulfilling the minimal requirements of the critical textual scholarship) would allow for the inclusion of (a) printed critical editions created with a digitizing process that is not too demanding; (b) existing born-digital critical editions 37 with a transformation or spin-off process that is not too complicated; and (c) new borndigital critical editions created within the editorial framework provided by the corpus portal (as it is currently planned for the Digital Latin Library). 38

Second Proposal: Works Rather Than Documents
The second proposal to resolve the conflict between variety of editions and homogeneity within a corpus is to focus on works rather than documents. A text corpus is not an archive. Digital editions tend to start from or grow into some sort of digital archive. 39 In order to provide texts that are to some extent homogeneous, the editorial features within a corpus should not focus on contingent and individual material aspects of the text or on paleographic or codicological details. Instead of accumulating textual evidence and transcriptions of witnesses, they should focus on critical value, i.e. critical annotation, deep mark-up and the establishment of

S280
Digital Corpora and Scholarly Editions of Latin Texts some kind of representative text version with a canonical work structure. This does not mean that transcriptions and facsimiles etc. should not be included; they should in some way. It is just a matter of prioritizing when creating a digital corpus. Individual scholarly editions will always have to define their own priorities and tend to emphasize particularities of the textual material and specificities of the individual research perspective. The challenge here for future corpora of critical texts is to establish a basic and interchangeable data format to which a required set of data components of complex editions as described above can be translated, transformed or downgraded.

Third Proposal: Leave to Others What Others Do Better
Digital editions as part of a corpus cannot and should not be all inclusive. To the contrary: a characteristic of digital editions is the overcoming of the limitations of the publication itself through integration of or, here even more importantly, through linkage to external resources. 40 The theory of digital scholarly editing envisions an all-encompassing model of highly complex, layered, rich information resources. Individual digital editions, however, do not need to provide and maintain the full range of possible modules, such as high-resolution facsimiles, translations in various languages, all sorts of visualizations, additional contextual material, and user-friendly tools within one clearly delimited and self-contained publication. All these features and information enriching the reading experience and supporting individual research can hardly be provided and maintained within a single corpus. Rather, any additional feature that is not required according to the criteria of the corpus should be outsourced and either referred to via hyperlink or, if possible, embedded from external resources. 41 This is especially reasonable with regard to authority files; encyclopedic knowledge, as part of online reference works and compendia; paratexts, as part of other digital corpora; and facsimiles. As for the latter, ideally cultural heritage institutions, such as archives and libraries, take care of their own material and provide descriptions, high quality reproductions, and tools to engage with material in a standardized way so that it can be embedded and used by users and editors alike. The embedding of external resources can be realized in two different ways, both of which have advantages and disadvantages. The easiest method from a technical point of view is simply to include a link out of the edition that targets the external resource. An example of the application of this method is the digital edition of the St. Gall Priscian, which links to manuscript images at the Codices Electronici Sangallenses (CESG) Virtual Library (Figs. 8 and 9). 42 The integration of external information into the edition itself might be more userfriendly. Images or texts can be either included from the external server or, if restrictions relating to technical infrastructure or copyrights do not prevent it, mirrored onto a dedicated server. A technically advanced publishing framework has been developed by Jeffrey C. Witt: the LombardPress Web application 43 is designed to understand and consume common interfaces (so-called IIIF application programming interfaces 44 ) as adopted by a growing number of leading research libraries with manuscript collections in order to allow for the possibility of querying images of manuscript folios directly from library servers across the world (Fig. 10). 45 Fourth Proposal: Create Additional Value across the Corpus As pointed out under the first proposal, critical editions as part of a corpus need not be "truly digital" in the sense that they follow a digital paradigm and that they are created applying digital methods. Rather, the fourth proposal advocates the creation of additional value across the whole range of texts through the features and the technical framework of a "truly digital" corpus-based on an elementary data model for metadata, text, annotation, and paratexts.
As soon as a suitable and robust data model has been found to accommodate the various forms of textual criticism, additional value can be generated by enabling a full exploration of the data captured across the entire corpus. 46 This additional value cannot be provided in print editions, and it is characteristic of both individual digital editions and digital text corpora in general.
A set of generic and corpus-wide tools, features, and functionalities should address researchers' needs and expectations. 47 (1) First, the search function is of the highest importance for any digital corpus. It should not only provide a full-text search over all textual material included in the corpus (edited texts, apparatus, introductions, etc.), but also advanced search options, such as searching by logical operators and connectors and allowing for truncation and wildcards. Needless to say, a fuzzy-search function is indispensable for finding words and strings with orthographic variance within one and the same text as well as across various texts. Ideally, each and every word of the corpus is lemmatized to allow queries to match different forms of words, which may include even synonyms. 48 In addition to this, metadata allows for faceted searching of all kinds. It could be used to search by geographical regions or places of origin or provenance; by specific centuries, decades, or years of creation; by genres (like the Thesaurus Linguae Graecae categories of historici, poetae, philosophi, 45 LombardPress-Web builds on the "Scholastic Commentaries and Texts Archive" (SCTA: see http://scta.info/). The SCTA database first points to the ID of a respective codex surface. If the holding library's image repository is IIIF compliant, the SCTA database will link out further to the ID of the IIIF canvas and from there to the URL of the image itself. For a draft proposal of this SCTA data model see http://lombardpress.org/2016/08/09/surfaces-canvases-and-zones/; about LombardPress in general, see http://lombardpress.org/about/. 46 In the area of linguistic corpora there have been attempts to address the issue of reconciling different formats. See, for example, Salt and Pepper at http://corpus-tools.org/. Salt and Pepper are not just methodological recommendations, they are functioning, extensible open source tools that support the integration of linguistic corpora created according to different principles into a larger framework. 47 Cf. Henny and Neuber, "Criteria for Reviewing Digital Text Collections." There should be also a set of tools, features, and functionalities for the wider public in order to extend the usability of critical editions beyond a scholarly audience. This, however, lies beyond the scope of this article. 48   theologi, oratores, etc.), or by a specific meter. 49 Based on the markup, searches could be limited to a certain type or content of apparatus entries (see above).
(2) Another essential feature of a text corpus is an elaborated index function. Indices should be generated and interlinked both work-wide and corpus-wide from the metadata (as regards authors, works, genres, periods, keywords, etc.) and from the markup (depending on the encoding schema with respect to named entities, that is, marked-up persons, places, dates, events, etc.), and where the texts are lemmatized, word indices could be provided. Lists of manuscripts should be created according to the structured information given in the documentation.
(3) The third fundamental functionality of a digital corpus is the provision of hyperlinks generated from explicit references, pointers, and identifiers in the markup and metadata. Internal links are to be realized as text-wide (especially connecting text and critical annotations), as work-wide (connecting text, manuscript witnesses, translations, and accompanying material) and as corpus-wide (connecting intertextual references, dictionary entries, registers, and indices). External links might point to digital archives (providing manuscript facsimiles, catalog entries and descriptions, etc.), digital corpora (providing relevant texts and contextual material), digital encyclopedias and dictionaries, and to any outsourced or externalized material (forums, audios, videos, blogs, etc.; see above).
(4) The aptitude of a digital corpus for scholarly use then completely depends on addressability and citability of all its parts and components, namely of the critical text (according to books, chapters, paragraphs, stanzas, verses, lines, words, and the respective critical annotations) and of the documentation (manuscript descriptions, transcripts, and introduction) as well as on the addressability and citability of versions, in case changes have been carried out or a progressive publication mode has been established. If the editorial framework allows for progressive publications, updates, additions, corrections, and so on (which in open software development and in digital humanities research is generally recommended 50 ) this would have an enormous impact on all areas of the corpus. Keeping track of versions is an extremely challenging task, especially if the corpus is supposed to provide canonical text versions that do not change. 51 Be that as it may, the data model and publication framework need to make sure that every part, layer, and format 49 Cf. above, n. 33, on "Pede certo." 50 The "release early, release often" policy was originally applied in the Linux development community. Following the publication of the essay "The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary," by Eric S. Raymond (Beijing and Cambridge, MA, 1999); online: http://www.catb.org/~esr/writings/cathedral-bazaar/, this policy became increasingly popular among digital humanities scholars and has been adapted to publication strategies not only for tool development but also for the creation of digital scholarly editions ("progressive editions") in order to create a tight feedback loop between the editor and expert scholars in their respective fields of research: see Gunther Vashold, "Progressive Editionen als multidimensionale Informationsräume," in Digital Diplomatics: The Computer as a Tool for the Diplomatist?, ed. Antonella Ambrosio, Sébastien Barret, and Georg Vogeler of the critical edition is clearly addressable, according to a URN-naming convention as specified, for instance, by the Canonical Text Services (CTS) and used by the Perseus project and the Homer Multitext project; 52 or by something similar to the Documents, Entities, and Texts (DET) system as recently presented by Peter Robinson in his widely discussed draft article on academia.edu. 53 (5) No matter how user-friendly the interface of an edition or corpus may be, user scenarios and research questions cannot be anticipated always and everywhere. For this reason, it is imperative to provide as much raw data and material as possible via interfaces (APIs) and downloads in order to enable scholars to access and collect the data directly. The editorial framework should allow for an import of various formats (such as TEI/XML, plain text, docx, pdf, tiff, and jpg) specified by the editorial guidelines. Ingested text files would be converted into corpus-specific XML, ideally customized TEI, in order to be stored and provided in the same format as the files created within the framework directly.
(6) In connection with downloads and APIs there is the question of copyright and licenses. Digital humanities scholars and open-knowledge activists commonly agree today that a Creative Commons Attribution ShareAlike (CC BY-SA) license is the best way to make sure the editor's work is appropriately credited and to ensure that the data is openly accessible and remains open data. 54

Conclusion
Creating a digital corpus of critical editions is a complex task. It involves a wide range of strategic decisions to harmonize the heterogeneity of digital scholarly editions with the core feature of a corpus residing mainly in the homogeneity of the way the texts are prepared and presented. Several suggestions have been proposed to convey a maximum of textual criticism with a minimum of formal requirements in order to provide a suitable data model, a practical editing environment, and a maintainable publishing framework that is attractive to both critical editors and scholarly users. A technical and institutional framework for integrating and exploring critical editions on a large scale is a great desideratum. It also seems to be a possibility worth the effort to attain.