Querying Variants: Boccaccio’s ‘Commedia’ and Data-Models

This paper presents the methodology and the results of an analytical study of the three witnesses of Dante’s Commedia copied by Giovanni Boccaccio, focusing on the importance of their digital accessibility. These extraordinary materials allow us to further our knowledge of Boccaccio’s cultural trajectory as a scribe and as an author, and could be useful for the study of the textual tradition of Dante’s Commedia. In the first section of the paper, the manuscripts and their role in previous scholarship are introduced. A thorough analysis of a choice of variants is then offered, applying specific categories for organizing the varia lectio. This taxonomy shows how fundamental it is to combine the methodological tools for studying copies (as usual in medieval philology) and those for studying author’s manuscripts (as usual in modern philology) in dealing with the three manuscripts of Boccaccio’s Commedia: in fact, the comparative analysis of the three manuscripts has much to reveal not only of their genetic relationship but also of Boccaccio’s editorial practices. Furthermore, the analytic categories inform the computational model behind the web application ‘La Commedia di Boccaccio’, created for accessing and querying the variants. The model, implemented in a relational database, allows for the systematic management of different features of textual variations, distinguishing readings and their relationships, without setting a base text. The paper closes on a view to repurposing the model for handling other textual transmissions, working at the intersection between textual criticism and information technology.

1. Introduction §1 This paper presents the methodology and the results of an analytical study of the three witnesses of Dante's Commedia copied by Giovanni Boccaccio, focusing on the importance of their digital accessibility. These materials allow us to further our knowledge of Boccaccio's editorial practices as well as of his cultural trajectory as a scribe and as an author and could be useful for scholars who want to study the textual tradition of Dante's Commedia. §2 In sections 2 and 3, we summarize the analysis of the varia lectio of the text of Dante's poem included in Boccaccio's autograph manuscripts: Toledo, Archivo y Biblioteca Capitulares, Zelada 104 6 (To); Florence, Biblioteca Riccardiana, 1035 (Ri) (Figures 1-2); Vatican City, Vatican Apostolic Library, Chigiano L VI 213 (Chig). The witnesses have been entirely collated. 1 §3 In section 4, we introduce 'La Commedia di Boccaccio', <http://boccaccio commedia.unil.ch/>, the web application created for accessing and querying the textual variants (cf. below, Figures 3-5). We focus on the conceptual model informing the database, with a view to its repurposing for managing other textual transmissions.

Giovanni Boccaccio, scribe and editor of Dante's
Commedia §4 Giovanni Boccaccio's works and autographs have seen a recent revival of popularity following the scholarly yield of the author's 2013 centenary. Recent scholarly output spurred by novel discoveries of Boccaccio's method has sought to elucidate one of his much neglected facet: his activity as editor and his concern for 1 For sections 2 and 3, a more in depth study in Tempestini (2018aTempestini ( , 2018b; here they serve as a necessary but short introduction for the presentation of the web application and its subjacent model, in section 4. The complete transcription of To has not yet been published; the transcription of Ri, edited by Società dantesca italiana, is available at <http://www.danteonline.
it/italiano/codici_frames/codici.asp?idcod=321>; the digital facsimile of Chig is available at <https://digi.vatlib.it/view/MSS_Chig.L.VI.213> and its transcription is included in Tempestini   Petoletti and Martinelli Tempesta (2013). The annotations of Boccaccio and Petrarca are also studied respectively by Cursi (2015) and Pontani (2002Pontani ( -2003. For more about the role that Boccaccio played as a scribe in the birth of the Italian literary tradition, see Eisner (  gemello" (Alighieri 1994, 42), referring to manuscript Vaticano Latino 3199 (Vat), probably gifted to Petrarca by Boccaccio (Traversari 1905;Billanovich 1947;Bertelli 2014, 35-38;Trovato et al. 2013 Boccaccio's editorial work on the Commedia did not rely on a single exemplar but was influenced by multiple sources. §7 Finally, it is essential to recall the importance of the three manuscripts to the textual transmission of the Commedia as reconstructed by Giorgio Petrocchi: Boccaccio's editorial activity would have increased contamination in the tradition and represented the limit of the antica vulgata, whose ante quem term would then be around 1355. Petrocchi's edition is based on the complete collation of 27 manuscripts (referred to as antica vulgata, for the list see Alighieri 1994, 57-59) chosen among those dated, at the time of the edition, before 1355 -that is, before Boccaccio's copies. However, Boschi Rotiroti's recent codicological and palaeographical studies have shown that only 22 of these 27 manuscripts are actually ante 1355. Besides that, according to Boschi Rotiroti's recensio codicum, a total of 85 manuscripts are datable within 1355 (Boschi Rotiroti 2004, 15-17). §8 Today, the notion of an antica vulgata seems outdated (Tonello and Trovato 2013). Mecca's studies, in particular, provide an incisive reflection on Boccaccio's influence on the later tradition: "[…] tutto sembra indicare che da un punto di vista testuale non esiste uno sbarramento cronologico del Boccaccio nella tradizione manoscritta della Commedia" (Mecca 2013, 182).

Tempestini and Spadini: Querying Variants
Art. 1, page 7 of 28 §9 Concerning the relationship among Boccaccio's manuscripts, in reference to To and Ri, Petrocchi concluded that, although they have been considered, "non s'è reso necessario l'analitico raffronto con Chig il quale si impone sugli altri con la qualifica di edizione ultima e definitiva del testo dantesco" (Alighieri 1994, 18-19) 4 . Tempestini and Spadini: Querying Variants Art. 1, page 8 of 28 §13 Where there is a variation in the texts of the three cantiche, it is most often the Toledano manuscript that has a different reading, while Riccardiano and Chigiano manuscripts are typically in agreement. This is primarily due to the fact that most of the variants that were introduced by Boccaccio in Ri are replicated in Chig, while To is usually consistent with readings already present in the antica vulgata (as well as with the family of the exemplar of the manuscripts themselves, of which, as mentioned, is part Vat). The complete collation thus confirms Petrocchi's argument concerning the three manuscripts' progressive detachment from that family, as well as the chronological sequence To -Ri -Chig. It also appears possible to divide the codices into two groups: To on one side and the Ri -Chig couple on the other, since many of the innovations in Ri are retaken in Chig. The innovations in Ri and Chig add new and very significant lectiones singulares, a testament to Boccaccio's continuous work on the text of the Commedia. §14 Representing and organizing the varia lectio emerging from a complete collation is a complex and critical process, and the specific case is also unique for the notoriety of both the text and its scribe, as well as for the value of Boccaccio's copies in the context of his broader cultural and editorial project. Boccaccio worked on the Commedia for over twenty years, editing a text which, as demonstrated by the three copies and the Esposizioni (Boccaccio 1965), 6 would never acquire a stable form. In order to study the varia lectio, not with a view to settling on a definitive text of Dante's Commedia, but rather in order to trace the evolution of Boccaccio's Commedia and to try to glean something more concerning his editorial practices, we should reconsider the tools and categories through which we analyse textual variants. It is fundamental to establish and define a vantage point onto the matter of textual oscillation: given that our stress is on the modality of the transcription, rather than on the valeur ecdotique of the copied text, each reading should be examined 6 This is the text of the public lecture and commentary for which Boccaccio was commissioned and which he began on 23 October 1373 in Santo Stefano in Badia. The project was suspended in early 1374, likely due to Boccaccio's illness. Boccaccio died in December of 1375; Esposizioni thus cuts off at the beginning of Inferno, XVII.

Tempestini and Spadini: Querying Variants
Art. 1, page 9 of 28 singularly in order to theorise its origins and inner logic. Although some variants can easily be attributed to mechanical or palaeographic errors, the analysis of each reading will allow us to better appreciate the copyist's method. §15 The complete collation includes variants whose readings are absent in the antica vulgata, suggesting that they are Boccaccio's own innovations: about 240 cases, from over 1500 variations sites emerging from the full collation. (The term "innovation" thus includes also erroneous readings, given that this is an analysis of places where the three manuscripts do not agree, which does not include innovations that passed on from To to Ri and to Chig.) §16 In the attempt to analyse the qualitative aspects of the textual variants we have had to settle on a limited number of categories reflecting the most common forms of divergence in the innovations. We propose an articulation and differentiation, even if provisional, of the varia lectio that includes categories used by the filologia d'autore (Stussi 2011, 182-183). Indeed, as mentioned before, the study of the variants for themselves, of the text of Boccaccio's Commedia in its oscillations and of Boccaccio's editorial behaviours, requires a different point of view on the copyist, the author himself of that textual mouvance.
Five categories describe the most significant forms of divergence:  XXV, 13-15). Voce is easily explained, this reading is like a comment: voce instead of voglia (di dimandar) and very close to the reading of the same manuscript at Inf. 33, 59, that is fame instead of voglia (di manicar). §18 Complete transcriptions and collations of long textual works are, in general, not easy to produce and manage. 8 Beside that, it is unusual to convey the complexity of the varia lectio not in a critical apparatus of a scholarly edition.
Nevertheless, we consider fundamental to secure the availability of this data for scholars. On paper, the search for specific variants and categories, according to any classification, is certainly complex and demanding. A digital representation, on the 8 An important endeavour in this direction is Shaw's digital edition of Dante's poem (Shaw 2010), which presents full transcription and collation of the seven witnesses; in particular, the VBase system to access variants allows for finegrained queries (see Spadini 2015). Nevertheless, the role of IT in the project discussed in the present article and in Shaw's work are different, being far more complex in the second. In Shaw's case, the collation and the analysis of the textual variants occupy a preliminary phase (recensio) in the editorial workflow and benefit from computerassisted tools. Here, on the contrary, the collation has been performed manually (since the work was almost finished when the first attempts with automatic collation were made, and it would have been too timeconsuming to redefine transcription and collation criteria for the software) and the analysis is not part of an editorial workflow; the IT contribution, here, is limited to the datamodel informing the database schema and to the web application for querying the data.

Tempestini and Spadini: Querying Variants
Art. 1, page 13 of 28 contrary, might provide ready accessible and searchable materials: with this aim we devised 'La Commedia di Boccaccio' (Figures 3-5). §19 This web application, further presented in the next section, allows for the visualization and querying of the variants between the three manuscripts, including Petrocchi's critical text for reference. The unit, here, is the reading: the database  contains over 4500 readings from around 1500 variation sites. Each reading belongs to a witness (To or Ri or Chig) and corresponds to a textual location (cantica, canto, verse). Possible queries concern the single reading (present or absent in the antica vulgata), or the relationship between readings: the latter is described with the combination of witnesses involved (To vs Ri + Chig or Chig vs To + Ri etc.) and the category of change indicating the different kinds of variation (inversion; addition/ omission; inflection; echo/anticipation; lexical). As mentioned above, only those readings that can possibly be considered as Boccaccio's innovations, and not other variants, are marked with a category of change.
4. Textual variants and data-models §20 As explained in the previous sections, scholars need to compare, classify and order textual variants in order to make sense of them. This is one of the areas where Textual Criticism meets Information Technology (IT). There are a number of benefits to the organization, storage and querying of the data in digital form, not least of which is the ease in retrieving, reusing and sharing them. In this section, we will demonstrate how a theoretical modelling on the data and the consequent creation of a datamodel are prerequisites to conceiving such classificatory systems, to be exploited by means of IT resources. In order to do so, we will further discuss she is working with: whether it is a string (sequence of characters), a date, a decimal number, vel sim. She also has to specify how to parse the data (tokenization) -into words, for example -and provide guidelines for that segmentation, specifying, for instance, that each section of string between two white spaces is a word. (As this task can be accomplished by scholars who are not strictly speaking programmers and to avoid to set a precise division of labour, which this kind of work challenges, we prefer the term 'user' taken in a broad sense.) Following this rule, two words separated by a hyphen or by an apostrophe will be considered a single unit. This specification may or may not be suitable for the end goal, be it the generation of a list of words or an analysis through complex algorithms. As this simple example demonstrates, the user has a fair degree of freedom; thus, it is crucial that she maintains a clear understanding of the data and of its eventual application. In the Humanities in To summarize: understanding the data hinges on interpretation and goals, or better, on the interpretation of the data itself and the interpretation of the goals. In order to fully exploit the realm of possibilities granted to the user, it is not only important to understand the data, but also to convey this understanding to the machine. In other words, in order to control the machine and not to be controlled by it. For this purpose, knowledge must be organised into a model, in this case a datamodel.
A datamodel is nothing more than a formalization of our understanding in ways sufficiently internally consistent, logically coherent and explicit to be applied in one

Tempestini and Spadini: Querying Variants
Art. 1, page 17 of 28 §22 In addition to these features, for which a low level of interpretation is required, readings can be analysed, classified, interpreted in a number of other ways.
The aims and the criteria of the analysis may vary, heavily depending on the work in question and on the methodological approach. §23 In the case of textual variation, an initial distinction can be made between categories that concern (A) a single reading and those that describe (B) the relation among the readings, i.e. the variation. For example, the notion of hypermeter or of error can only be applied to a single reading, and not to the relation among the readings, while an addition or deletion involves the presence of two or more readings. (An error can be spotted by comparing readings; nevertheless ' erroneous' is not a quality of the relation, but of the reading.) As far as concerns the relation among the readings, the model does not set a base text: thus, the readings are all on the same level, each of them being a variant to each other. Addition and deletion can be considered together because no orientation is set for interpreting a deletion as an addition or vice versa. As we shall see below, the absence of a base text is also essential to defining the entities to be compared. §24 The taxonomy used in 'La Commedia di Boccaccio', presented in the previous section, is recalled here from the point of view of datamodelling. As mentioned above, different texts and methodologies require different categories to be enlisted. Here, for each individual reading (A), it is specified whether it is present in the textual transmission prior to Boccaccio (the antica vulgata). The relation between readings (B) might demand a generic classification, such as linguistic categories or categories of change (adiectio, detractio, immutatio, transmutatio).
The taxonomy in use here includes some of them, while others have been expressly designed to address specific issues relevant to this corpus. Thus, the relation between the readings (B) is described in terms of addition/omission, inversion, morphological inflection, echo/anticipation and lexical change. (The ' echo/anticipation' category is here considered as a form of substitution, but could also be regarded as a feature of the single reading.) Furthermore, it is specified if the variation occurs in rhyme or not.
Tempestini and Spadini: Querying Variants Art. 1, page 18 of 28 §25 As said above (section 3), the aim of this study is not the building of a stemma representing the textual transmission; rather, it is the analysis of the textual dynamics in Boccaccio's copies: the methodology adopted is in fact closer to genetic criticism than to stemmatology. Therefore approaches such as those tested in Spencer, Mooney, Barbrook, Bordalejo, Howe and Robinson (2004) and Andrews (2016), exploring the use of weighted variants, have not been applied here. §26 As far as concerns the relation between the readings, the combination of witnesses to analyse can vary: the three witnesses can be considered together, or in varying pairings.
Consider the following hypothetical variants: (1) A: cat | B: dog | C: cat | D: CATS | E: cat; it is impossible to give a unique and detailed description of the relations between all of them. These include, for instance, a lexical change, a morphological one and a typographical one. Further information is needed to specify among which witnesses these changes occur. A more thorough interpretation would be: (2) ACE vs B, lexical change; ACE vs D, inflection change; ABCE vs D, typographical change.
Given that the combinations of witnesses may change for each variation site, the more consistent way to pursue the variation is to examine the witnesses in pairs: 10 (3) A vs B, lexical; A vs C, no variation; A vs D, inflection -typographical; A vs E, no variation; B vs C, lexical; B vs D, lexical -inflectiontypographical; B vs E, lexical; C vs D, morphological -typographical; C vs E, no variation; D vs E, inflection -typographical. From this complete description (3), it is possible to return to the previous one (2), and vice versa. It is important to remember that no base text has been set. §27 Given this theoretical framework, implementation may differ depending on the corpus. In the case of the Boccaccio's copies of the Commedia, the pairs are replaced by combinations. This is because the variance never affects all three witnesses at once, but always opposes two witnesses to a third. Recording all the pairs would therefore be redundant.
To summarize, the model outlined here has a number of crucial characteristics: • it distinguishes between the features of the reading and those of the rela tions between the readings; • it allows us to append more than one feature to each reading and relation; • it does not require a base witness to orient the variation; • it permits to annotate each pair of witnesses or a combination of them for each variation site.
In the project 'La Commedia di Boccaccio', the model has been implemented into a relational database. Other possibilities such as XML (TEI) or a graph database might also have been suitable, 11 given that they meet the requirements for the use of a widespread format that follows rules defined by standards: this is fundamental for facilitating preservation, data sharing and in order to make use of existing frameworks.
As said, in this case data are organized in a relational database (Figure 6), which can be queried through SQL and published. Data are inserted into the database using spreadsheets automatically converted into SQL instructions. §28 The two central tables, 'reading' and ' annotation', collect most of the others' information. The table 'reading' carries the features of a single reading (in the taxonomy referred to as A), while the ' annotation' table contains descriptions of the relations between the readings (B) and the combination of witnesses involved. 11 A first attempt to encode the variants following the taxonomy just described with XML/TEI suggests the use of the elements <app>, <rdgGrp>, <rdg>, each of which may carry attributes such as @type and @ana. E.g.: <app><rdgGrp type="add_om"><rdg wit="TO RI">testa alta</rdg><rdg wit="CH" type="absent">testa</rdg></rdgGrp></app>.
A standoff approach could be envisaged for storing information about all possible pairs of witnesses.

Tempestini and Spadini: Querying Variants
Art. 1, page 20 of 28 §29 As with any other row in a table, each reading has a unique identifier.
While each reading has features such as location and witness belonging, they do not suffice for its identification, since two readings can belong to the same verse and to the same witness (e.g. 'nella sua' and 'grameça' in Chigiano at Inf. I, 50).

Conclusions §30
In this study and the related web application, we aim at providing resources for a better understanding of the Commedia copied by Giovanni Boccaccio, his role as a copyist -he being an author -, and the relations between the three witnesses. §31 As mentioned in the first part of the article, it is not our intention to sponsor a cult of textual variants. Rather, our aim is to give voice to the variants in the text in order to analyse the editorial practices of a sui generis scribe. The data collected highlights not only a much greater tendency for innovation in mss.
Riccardiano and Chigiano, but also the considerable importance of some of these innovations. Ms. Chigiano in particular does not appear as Boccaccio's final and definitive edition of Dante's Commedia as Petrocchi suggests (Alighieri 1994, 19), but as an experimental text in which the copyist manifests his freedom; ms. Chigiano can thus be defined as Boccaccio's own edition. This likely explains Boccaccio's choice

Tempestini and Spadini: Querying Variants
Art. 1, page 21 of 28 of ms. Toledano, the least original and customized of the manuscripts, for reference in his Esposizioni sopra la Comedia, his public reading and commentary. 12 Finally, in discussing variants, it is not entirely possible to judge whether some of them are due to mnemonic errors or to a conscious choice without entering into a discussion that goes beyond the limits of empirical data. Nevertheless, in analysing the trajectory from To to Chig, it seems evident that the development of the work did not aim at the most authoritative ante litteram edition of Dante's Commedia but rather at a simplification of the text in which the boundary between exegete and scribe/copyist comes increasingly to be blurred. §32 In the second part of the article, we propose a conceptual datamodel for textual variants, which can be tested on other corpora beyond the project presented here and its specific taxonomy. A model is inherently selective in its features, and describing it requires providing an answer to the question, "what attributes of the original does it capture and make explicit, and which does it omit?" (Unsworth 2002, now in Terras et al. 2014. When considered as the result of selection and abstraction, the model resembles a map: as the expression has it, it cannot be the territory.
Modelling is also a fundamental step for any attempt of automation. This is not to say that our longterm goal is to have machines that recognize and classify textual variants, but that structuring our understanding of these phenomena is a first, fundamental step towards leaving to computers some of the most timeconsuming work and keeping for scholars the tasks that requires all their knowledge and experience: for instance, choosing the categories and making sense of the output. §33 In theory, most of the categories used in the model could already be automatically identified. This is the case for additions, deletions and transpositions,