Cartography and Code: Incorporating Automation in the Exploration of Medieval Mappaemundi

,

1. Introduction §1 Over the last three decades, scholars of medieval cartography have sought to reclaim the reputation of medieval mapmakers. The maps and their makers have, at times, been overlooked, due in part to nineteenth-and twentieth-century perceptions of the lack of geographical knowledge reflected in the perceived "inaccuracy" of Wacha and Levernier: Cartography and Code Art. 4, page 2 of 29 medieval maps. More recently, however, scholarly attention has shifted away from lacunae, and towards the recognition and appreciation of calculated rationales and purposeful intentions that guided medieval mapmaker's cartographic representations (Kupfer 2016;Talbert and Unger 2008;Edson 1997;Woodward 1987). §2 Since individual medieval cartographers have left little discursive evidence to explain the design and content of their maps, scholars turn to the implicit evidence contained within the maps themselves, as well as contemporary contextual evidence, in order to better understand medieval conceptions of space, place, and time. Most world maps dating c. 1000-1400 fall into four typological categories: Tripartite (also known as T-O maps), Zonal, Transitional, and Quadripartite maps (Woodward 1987, 294-299). T-O maps depict the world with an orientation rotated 90 degrees to the left; the cardinal direction of East is located at the top of the map. A "T" situated inside an "O," divides the world into three known continents, with Asia occupying the upper half, and Europe and Africa the bottom left and right quadrants respectively (Figure 1a and b). Of the ten maps that comprise the dataset for this study, eight figures are traditional T-O maps, with seven of these in pictorial format and one in list format (For a complete list of the 10 maps, see Works Cited, Primary Sources). The   (Figure 1a and b). The two maps appear on the recto and verso of the same folio, but scholars have noted that they do not seem to be linked directly through toponym content.
Veccompare has allowed for the visualization of data from a different perspective that suggests that the Psalter maps' close physical proximity, i.e. on the same folio in the same manuscript, is perhaps pursuant not to a direct, one-to-one, modelderivative type of toponymic relationship, but instead to an indirect relationship stemming from a common textual source used by both mapmakers.

Description of veccompare
2.1. The dataset §9 Medieval data is "fuzzy". Part of making it operable for, and accessible to computers necessitates an intermediary step, namely turning it into twenty-firstcentury-ready data, a back and forth process that takes place between a scholar with domain expertise and a scholar who has expertise of the system within which that data has to fit. In order to create a twenty-first century dataset of relevant medieval toponyms, the authors worked with original sources and followed the traditional practice of creating two categories for each place name, a diplomatic transcription, which reflects the spelling of the toponym, as it appears on the map or in the narrative source, and a normalized transcription, a version of the place name that can be compared across multiple maps. Because the place names have been drawn from pictorial maps, list maps, and narrative map sources, the textual toponym serves as the common denominator for both diplomatic and normalized transcriptions. (The full dataset appears at the beginning of the appendix.) §10 One of the benefits and responsibilities of working with digital humanities projects is the opportunity to articulate editorial decisions, making them explicit and transparent. To this end, the normalization decisions made for the medieval place name dataset required careful thought and clear reasoning.
To begin, the authors addressed issues regarding the selection of place names to include in the dataset, as well as those to omit. For example, not all the text on the ten mappaemundi in the dataset denoted what one might consider to be a traditional place name or geographical feature. For this dataset, a "place name" is defined as the textual label of a specific geographical feature on the map, and to include such features as cities, provinces, monuments, oceans, mountains, and winds-but not cardinal directions, even if they were labelled on the map. Because medieval cartographers used the names of animals and mythic creatures to depict the geographical areas, in which they lived, these too were included in the dataset.
For example, the Psalter Pictorial map, the Hereford Map, and Hugh of Saint-Victor's Descriptio mappe mundi all illustrate or mention the blemmyae, creatures who had no head, but had eyes in their chests, and lived in the southern regions of Africa; the phoenix, a mythical bird, appears only in the Hereford Map and Lambert de Saint-Omer's world map in the Liber Floridus, both in Africa, but in different parts of the continent. Moreover, if the place names for three geographical features-the Red Sea, the Persian Gulf, and the Mosaic Crossing-were unlabelled, but clearlydepicted and unambiguous, the authors inserted these names into the dataset since their easily-recognizable nature and universality can be taken as a label unto itself. The diplomatic place names from the sample of ten medieval mappaemundi were then transcribed into an excel spreadsheet. §11 The process of transforming the diplomatic place names into normalized ones involved a number of unexpected decisions. When normalizing medieval place names, expertise is needed in order to know whether two seemingly-related place names categorically refer to the same place and to be able to label them as such, or whether two similar place names refer to the city, river, or region of the same name.
Moreover, when confronted with multiple spellings, decisions had to be made as to which one should be used as the standard normalized spelling. For this project, the foundational normalizing principle was to use the most common spelling across the dataset of maps. For example, the dataset contained several variants of and one that denotes Columne Erculis only. In this way, one can compare how many maps include the Pillars of Hercules, how many include the Altars of the Free and how many include both. §15 Overall, the authors took a conservative stance regarding normalization, using the diplomatic transcriptions and domain expertise, as the basis for adding qualifiers to certain toponyms only when necessary. If, in the rare instance that a diplomatic transcription was ambiguous, a question mark was used at the end of the place name to indicate possibility and/or doubt. While it may have been tempting to add other qualifiers, like the name of the continent in which the place name was located, the authors considered this to be an excessive intervention, since some of the maps do not delineate these boundaries with explicit labels or markers. §16 The normalization of place names emerged from an on-going and iterative process. As more maps were added to datasets, new toponyms were added to the list of unique toponyms. This afforded an opportunity to revisit and refine the existing list of unique toponyms, rethinking and/or qualifying the normalized transcriptions, as well as adding new normalized place names. Throughout the process, constructing a dataset of normalized medieval place name transcriptions was a dynamic and collaborative process, and it is our hope that this activity will continue. While the dataset has been designed for broad comparisons of toponyms, researchers are welcome to tailor the current set of toponyms with alternative normalization approaches, such as adding qualifiers to answer research questions specific to their own work or commenting on the existing one.

The code §17
With a normalized dataset in place, a computer code using R software veccompare was created to facilitate data analysis, automatically computing overlap and non-overlap between toponyms across the maps held in the dataset (see Team R Core 2017). Veccompare is a "package" developed in R, similar to an "add-on" or "extension" in a web browser, such as Mozilla Firefox or Google Chrome.
R is freely available, can be installed by a user. This software adds and/or makes it easier for new functionality of the base program. The veccompare package is comprised of coding commands called "set operations," and is built around one primary command, compare.vectors, which takes a named list of elements (in this case the list of normalized place names) from the maps held in the dataset, each of which comprises a collection, or vector, of place names, and computes all possible comparisons between them. §18 For each comparison, veccompare performs three "set operations." It finds 1) the total set of elements across maps-the "union," 2) the total percent overlap across all of the maps involved in a comparison-the "intersection," and 3) the elements that are unique to each of the maps-the "relative complement" or "difference." This section of the paper addresses the structure of the code, exactly how it is applicable to the maps in the dataset, the streamlined approach that a tool like this enables, and its potential for use with other datasets that would benefit from the same analysis. To be clear, researchers do not need to know much R to use veccompare. They can apply it knowing just a few syntax rules for specific commands. The Appendix provides stepby-step instructions and a more detailed explanation of RStudio.   closely resembles the Hereford map in structure and content than any other surviving early map". In particular, Barber highlights similarities in illustration, the depiction of river systems, the outlines of certain islands, and toponymic content (see Figure 3).
Thus, one would expect veccompare to find a relatively high percentage of toponym overlap -and indeed it does in Figure 3. Veccompare shows that 68% of the Sawley toponyms also appear in the Hereford map. §24 The overlap function is not limited, however, to comparing only two maps at a time. When veccompare calculates overlap for all ten maps, three common normalized toponyms emerge: Hibernia, Sicilia, and Roma. On the one hand, this small number of shared toponyms is rather surprising given that almost all the ten maps in the dataset are of English origin, but this may be, in part, due to the normalization process.

Creating reports and visual representations of data §26 Veccompare does in minutes what would have taken a lifetime to do with
traditional methodological approaches performed by hand. The veccompare package formats its output into multiple variations, each of which serves a different purpose and allows for different points of access to process and analyze the data. The data report takes the output of the compare.vectors command, adds headings that explain the output and returns a markdown-formatted text document that is easy to read. While the sheer amount of data can sometimes be overwhelming -a full PDF document for all 10 maps results in over 1500 hundred pages -veccompare facilitates a close-reading process that map scholars have always practiced, but which can now be completed in a much shorter time frame. §27 A typical veccompare output report provides a table of contents and section headings to help researchers access basic data without having to read through the entire report. It also provides visual representations for easy access to relational data that may not be immediately apparent in the report. First, veccompare builds on the VennDiagram package for R, which can draw Venn diagrams for up to five-way comparisons (Figure 4a and b). Second, veccompare produces a graph in table format that provides a general overview of all specific two-way comparisons ( Figure 5). Third, veccompare transforms the data from the table format into a

Wacha and Levernier: Cartography and Code
Art. 4, page 15 of 29 network graph that shows the intensity of the comparisons. Figure 6a shows overlap connections whenever a map overlaps at least 20% with another map (i.e., when a given map comprising for example 100 toponyms shares at least 20 of its toponyms with a second given map). Overlap connections are drawn directionallyan arrow is manifested between a 100 -toponym map that shares 20 of its toponyms with another map, but not between the second map and the first map, if the second  One can also make the graph a bit less dense, by increasing the overlap threshold between maps to 50%, as seen in Figure 6b (Figure 6a and b). §28 Visualizing data in this way may precipitate new research questions, that were previously unimagined. In some cases, these questions can then be translated back into the code for further clarification. For example, in the case study that follows, once the code produced the number and name of toponyms that overlapped between the Psalter Pictorial and List maps, it became interesting to refine the corpus of results to go one step further and ask for a set of toponyms that overlap between the two maps, and no other maps in the dataset. This kind of iterative interaction moves scholars closer to understanding the unique interventions of medieval mapmakers and identifying specific map groupings or families. The graphs also provide means by which to measure relative comparisons between maps. No map in the current dataset overlaps 100% with another, nor do any of the maps not overlap at all. Instead, what the graphs (especially Figure 5) show is the range of overlap defined by percentages between 20-90%, suggesting that while these percentages are relative, they vary significantly and can be used to demonstrate degrees of toponymic relationship between the maps. To note, veccompare shows that the 68% overlap of toponyms between Hereford and Sawley maps is one of the highest seen in the graph, and very different from the respective 20% and 26% overlap seen between the Cotton and Psalter Pictorial map. argued that "the discrepancies between the map contents suggest that the Psalter map did not likely serve as a template for the List map." §33 Indeed, veccompare's Venn diagram of the comparison between the two maps does not demonstrate the high percentage of overlap that one would expect if the List map had been used as the toponymic model for the Pictorial map, but instead shows that the two maps share only 44 toponyms -a 33% overlap for the Psalter List map and only a 26% overlap for the Psalter Pictorial map (see Figure 7). §34 Thus, the paradox: the maps show a weak connection vis à vis toponym overlap, yet they are inextricably linked through their physicality, since they appear in the same manuscript back to back on the same folio. Schöller (2015,192) argues  (Figure 9a), and the overlap between the Psalter List map and the Descriptio is 76% (Figure 9b), rendering percentages higher than the 68% already noted between the Hereford and Sawley maps.   (1988,(81)(82)(83)(84)(85) in his edition of the Descriptio mappe mundi, argues that the Munich Isidore map is the closest surviving relative to the Descriptio, going so far as to suggest that the Descriptio may have been used as a model for the Munich map (Figure 10)

Conclusions §42
Working on this project has created a space where two separate academic domains have come fully to bear on the topic -this has not been a digital humanities project, nor a digital humanities project. Looking beyond medieval maps specifically at non-standardized text more generally, this work requires a back-and-forth conversation between domain experts and analysts. It is an iterative and ongoing process, that allows for deeper communication between disciplines, with an outcome that could not have been manifested without communication and collaboration. Inferior as two separate toponyms, would further refinement shape the answer to the question of what is shared across all the maps? §45 Working iteratively on questions about free-text normalization has allowed us to make progress without being overwhelmed and never getting started. Further normalization approaches have been discussed and could include tagging aspects of the data with a lightweight markup language that could then be converted into more standard formats such as the TEI (for example, allowing all maps that use "Scythia,"