Digital Approaches to Manuscript Abbreviations: Where Are We at the Beginning of the 2020s?

Abbreviations have been an important qualitative means for dating and localising manuscripts. In digital scholarship, however, they have received less attention. Reasons for this range from digital resources inheriting editorial traditions from print to normalisation being a prerequisite for many research questions. The aim of this paper is to build bridges by giving an overview of scholarship into digital and quantitative approaches – taking into account English, French, Old Norse and, to a lesser extent, Dutch, German and Celtic scholarship. It also makes a theoretical contribution by placing abbreviations into a typology of writing systems and proposing that the terms conditioned and unconditioned variation in analogy with phonology could be useful for studying abbreviation.

1 Introduction §1 In her entry for the Encyclopedia of Language and Linguistics, Lowe defines the aims of palaeography as follows: The basic task of palaeography is to provide the means of dating and localizing manuscripts by establishing patterns in the development of characteristic letter forms and abbreviations. (Lowe 2006: 134) As the entry recognises, abbreviations have been one of the most important means for dating and localising manuscripts available to palaeographers and book historians. In digital scholarship, however, they have been something of a sidetrack. Reasons range §6 Making better use of "accidentals" or "trivial variation" is central to quantitative approaches to palaeography. It can also be linked to what can be termed as "new philology" or "material philology" (cf. Driscoll 2010). Material philology has been calling attention to recognising the uniqueness of each manuscript copy, including codicological features, such as writing support and level of decoration, but also all of the accidentals, such as punctuation, spelling variation, lexical variation as well as abbreviations. All of these are related: "orthography, palaeography, and codicology are overlapping realms of the archaeology of the book" (Thaisen 2011, 84). Nevertheless, there are decisions that need to be made on how much of them to take into account. §7 The decisions are on which level to focus when studying variation. The focus of this review article is mainly on the graphemic level, as my concern is with different types of signs and their referents, not "allographic" variation between various shapes of the same sign (see e.g. Robinson and Solopova 1993, 20). In making this decision, my focus differs somewhat from much recent work, which has opted for differences on the "graphetic" level, recording allographic variation (ORIFLAMMS project, Rogos 2011; they also include studies of variant letter forms such as Thaisen 2011; Stokes 2014; Kwakkel 2012; Speed Kjeldsen 2013). The reason for this is that even though abbreviations are connected both to graphetic variation and the codes on the page, they are also part of medieval writing systems and made their orthography more complex than alphabetic writing. This complexity is something that deserves to be addressed as an important phenomenon in itself.
3 Abbreviations in the typology of writing systems §8 What makes abbreviations unique, compared to many other features deemed accidentals by textual criticism, is the range in the way the written form corresponds to its lexical and phonological referents. Abbreviations belong to the "grey area" components of the medieval manuscript -which Traube called Halbgraphische Objekte ("half-graphic objects") (Benskin 1977, 506;Römer 1993 and1997;Rogos-Hebda 2018, 58). However, while the non-alphabetic nature of abbreviation is noted by several scholars (LAEME 2.2.1.; see also Benskin 1977, 506;Rogos 2011, 47;2012, Honkapohja: Digital Approaches to Manuscript Abbreviations Art. 1, page 5 of 40 7), it is often discussed in somewhat imprecise terms or with much variation in terminology. This can be partly explained by different emphases, as some approaches are concerned with allographic variation, codicological phenomena or morphophonemic variation. (For example, Mazziotta [2008] aims to describe abbreviations from a graphetic perspective seeking to model the strokes made by the scribe. Rogos [2011,2012], Rogos-Hebda [2018], Cottereau [2005] and Cottereau-Gabillet [2016] discuss abbreviation in addition to range of other bibliographic codes such as initials. LAEME[2013] is considered with phonology and morphosyntactic description.) §9 The situation is not helped by using different names for the same phenomena in different fields of scholarship. As Römer (1997) notes, where linguists may speak of apocope, palaeographers speak of suspension (11). If we, however, focus precisely on the signs and practices used to abbreviate words in mediaeval writings and their lexical and phonological referents, it is also possible to describe them in terms of writing systems research and terminology developed by scholarship describing the emergence of writing systems in the ancient Middle East and shores of the Mediterranean. §10 "The main issue" with classifications of writing systems "is how to reconcile the two levels of language that written symbols correspond to […] the lexicon, whether words or morphemes […], and on the other hand sounds in the phonology, whether syllables (Japanese kana), phonemes (Italian) or consonant phonemes only (Arabic)" (Cook 2016, 6). The following account uses terminology used in a pioneering monograph by the Polish-born Assyriologist I.J. Gelb (1952), which aimed to describe the development of writing systems from a typologically comprehensive perspective (for a very comprehensive resource, see Daniels and Bright 1996; for a clear, if somewhat contentious, introduction, see Powell 2009). 1 The study introduced 1 The comparative study of writing systems is also a field that has somewhat suffered from a large amount of variation in terminology -this is especially since it is studied by scholars working in fields as diverse as palaeography, epigraphy, linguistics, philology but also archaeology and Egyptology. "What we have, instead, are narrow fields of study of the type of Semitic epigraphy, Arabic paleography, Greek or/and Latin epigraphy or/and palaeography, Chinese palaeography, papyrology, etc., all limited to certain periods and geographic areas. In all cases these narrow fields of study form Honkapohja: Digital Approaches to Manuscript Abbreviations Art. 1, page 6 of 40 a "tripartite scheme of logography, syllabary, alphabet" which has become the main typology used in writing systems research (Daniels and Bright 1996, 8). Abbreviations too can be described by this framework, which Gelb (1952, 12-13) already notes. (Gelb's [1952] example, the following sentence contains examples of all three: "Mr. Theodore Foxe, age 70, died to-day at the Grand Xing Station" [13]. As Gelb notes both the numeral "70" and the abbreviation Mr. for "Mister" are logograms, whereas "the rebus-type symbol X plus alphabetic ing" stand for the word "crossing" [ibid.]).
The typology works well, if one takes it as a somewhat flexible system and does not expect scripts to fall neatly under any category (Daniels and Bright 1996, 8). The placement of abbreviation symbols in this typology is illustrated by Figure 1. §11 Writing systems can be divided into phonography, in which signs are tied to sounds, logography, in which signs are tied to words, and semasiography, which is a broad umbrella term for symbolic systems that communicate meaning without being directly tied to natural language. 2 Phonographic writing systems can be further divided into syllabography and alphabetic writing. Syllabic systems refer to the basic building blocks of spoken utterance: syllables. They include Japanese kana or Korean hangul. Alphabetic systems link to vowel and consonant sounds; "a writing system subdivisions of wider, but still specific, fields of study, such as Semitic or Arabic philology, classical philology, Assyriology, Sinology, and Egyptology" (Gelb 1952, 22).
2 I am not entirely satisfied with the term semasiography, but alternatives, such as pictogram, icon or symbol, are too narrow, broad or come with additional theoretical baggage. Perhaps the most obvious term might be "pictogram," but unfortunately, as Powell (2009) notes, the term has been used so sloppily -it can refer to logograms like Egyptian hieroglyphs or to semasiograms -it is practically useless as theoretical distinction in serious scholarship. "Icon," on the other hand, has a wide range of meanings from religious art to graphical user interfaces. In linguistics, iconicity refers to similarity or analogy between a sign and its referent. This renders it partly incompatible with semasiography, as it may also include uses in which the relation between a sign and its referent is conventional and noniconic. The term "symbol" has even broader range of uses and definitions than icon. The term is used in various fields, including but not limited to semiotics, psychoanalysis and literary scholarship. This makes it too broad for the purposes of the distinction intended here, between writing systems tied to natural language and ones that are not. Consequently, I prefer semasiography as an umbrella term for writing in which the signs are not tied to necessary parts of speech. The term may be unfamiliar for many, but it is more precise than pictogram and lacks the theoretical baggage of icon and symbol. It can also be taken as a tribute to I. J. Gelb, whose pioneering account section 3 is builds on. Moreover, it provides a link with semiotics, a field that was very popular in the late 20 th century, but has recently fallen out of fashion, but could provide useful subcategorisations within the very broad field of semasiography.

Honkapohja: Digital Approaches to Manuscript Abbreviations
Art. 1, page 7 of 40 in which each symbol corresponds to a particular sound of the language, and, vice versa, each sound corresponds to a symbol, is called 'transparent'or 'shallow'" (Cook 2016, 7). A system like the International Phonetic Alphabet (IPA) gets very close to complete transparency, and smaller European languages like Finnish or Icelandic have fairly shallow orthographies, but the spelling systems of languages like English and French are less transparent. For example, English orthography carries such historical baggage as the "silent" final "gh" in eight, night, through and there are homophones such as dear/deer -in which, Laing argues, the distinction indicated by spelling convention is leaning towards logography (LAEME 2.2.1). In logography, signs are attached to entire words. Chinese script or Egyptian hieroglyphics are more logographic, even though both have phonographic connections (Powell 2009, 188).
It is, however, also possible for written symbols to communicate meaning without having a direct phonetic or lexical referent. §12 I call writing systems which are not connected to natural language semasiographic, following Gelb (1952) andPowell (2009). Semasiography can be defined as "material marks with a conventional reference" that "communicate information without the necessary intercession of forms of speech" (Powell 2009, 32). Semasiography is a broad and heterogeneous category which includes systems that predate phonographic and logographic writing, but also still exist beside them.
Modern semasiographic signs include, for instance, emoticons, computer icons and traffic signs. Semasiographic systems can be complicated and precise, as they also include, for example, mathematical and musical notation. Yet, it would be impossible to write this article in mathematical or musical notation without giving the numbers or notes some kind of phonographic or logographic referents. §13 Western scripts are predominantly alphabetic, but they also routinely contain characters that are syllabographic or logographic. An important point about the classification, which Gelb (1952) already notes is that the signs can be used flexibly depending on the context (16-17). For example, the heart shape (♥ or <3), which can carry meanings such as indicating one of the four suites in a standard 52-card deck ( Figure 2) or communicating love or other strong emotion as an emoticon, falls under semasiography. In examples 1 and 2, it is however used as a logogram, but referring to different words, which most English speakers have no problem parsing as "love" and "heart." What is more, example 2 contains two modern abbreviations categorised as letter/ number homophones by Bieswanger (2007, 5), which are used outside their normal category (see also Anis 2007). The Hindu-Arabic numeral 2 is normally a logogram, but in the example, it is used as a syllabogram due to phonological similarity /tʰuː/ with the English, single-syllable, preposition to /tʊ/. A second example of the same is the letter "U," one of the letters of the Latin alphabet, but here serving as a syllabogram as its name, when read out aloud in English is homophonous with the single-syllable second person pronoun "you" /juː/. The fact that these are syllabograms rather than logograms is evident, as you need two signs to write a two-syllable word such as "before": B4. It is thus very much possible to use this system to describe abbreviation: modern as well as medieval. §14 Medieval abbreviations can be easily described using this same typology.
Some abbreviations (ꝰ "-er, -ir, -re" in aftꝰ) are syllobograms consisting of vowel and consonant combinations (often "s" or "r" in combination with a vowel). Others (S. for "Saint" -or indeed modern acronyms) are logograms, as they correspond to the entire word. Within logograms, Laing makes a further distinction between "impure" logograms, which contain some phonographic cue, such as ꝥ "that," S[aint], writing common religious phrases with initials only (LAEME 2.2.1). On the other hand, Laing considers the use of Greek abbreviations in Latin such as xpc for "Christ" and iħc for "Jesus," or the ampersand, to be pure logograms, as they are tending towards abstraction as they are "not subject to phonological extrapolation" (LAEME 2.2.1). Even if they are made up of letters of the Greek alphabet and were often reproduced by the scribes using more familiar Latin shapes, they do not have these values (LAEME 2.2.1). §15 An important point to make about abbreviations, though, is that even though they may function like syllabograms or logograms, they are additional and alternative signs to an alphabetic system, not a new form of syllabic or logographic script to represent spoken utterances. Kopaczyk (2011, 96) notes that abbreviations should be treated as a sequence of letters rather than syllables, which is evident from examples such this co [ur] to indicate the ordinal "second." In the medieval system, superscript abbreviations (w t "with") typically function as phonetic complements. Semantic complements or determinatives, on the other hand, specify the meaning of a sign with an additional mark. Latin letters such as a bar though the ascender of a ꝑ "crossed-p" are semantic complements indicating that the letter stands for the abbreviation for "per, par, por" in Latin or English. The term semantic complement is also useful for those mediaeval abbreviations which Cappelli (1990Cappelli ( [1899, xxiii-xli) calls abbreviation marks significant in context rather than abbreviation marks significant in themselves.

(Cappelli's terms are in Italian Segni abbreviativi con significato proprio [xxiii] and
Segni abbreviativi con significato relative [xxix]. I am using the English translations by Heimann and Kay [1982].) Some medieval abbreviations, such as punctus or the horizontal bar based above the abbreviated word, simply indicate the presence of an abbreviation. These signs are best described as semantic complements rather than logograms or syllabograms.

Problems related to expanding syllabograms and logograms §17
Some of the problems related to expansion of abbreviations, practised in many fields, are caused by alphabetising texts written in a writing system whose orthography allowed also non-alphabetic characters. Orthography can be defined as "the rules for using a script in a particular writing system, that is to say how the  Table 9.1.) found in her Older Scots data that the scribes abbreviate the plural of nouns with ꝭ 61 per cent of time (e.g., partꝭ "parts"), followed by "-is" (20%), "-es" (11%), "-ys" (5%) and "-s" (5%). This is important as the practice of spelling the plural with "-is" or "-ys" (partis or partys) instead of "-es" (partes) is considered diagnostic of Older Scots in contrast to more southern dialects. An editorial approach that expands all abbreviated plurals following the most common spelling will lead to showing "-is" as the clearly dominant variant with 81%, when in reality there is no certainty which form the scribe intended (see also Czajkowski 2018, 96 variation is used to localise texts. For example, the word man, if abbreviated with a bar, may be interpreted as otiose, but may also indicate abbreviation. The form man̄ can be expanded either as "man," "mane," "mann" or even "maun" (if the two minims are interpreted as a "u"). (All of these forms are recorded by the Linguistic Atlas of Late Mediaeval English [eLALME, 2013[eLALME, (1986]). The expansion matters, because man is a very widely used form, whereas mane or mann are much more restricted and can be used as diagnostic forms (cf. Cruz-Cabanillas and Diego-Rodriquez 2018, 172). Consequently, there is often no certainty of how a certain abbreviation should be expanded -mixing writing systems that function on different levels can lead to problems like these.

Logography and language independence §21
A further feature of logograms is that they are less tied to a single language than phonographic writing. For example, the Hindu-Arabic numeral 2 can be read out aloud in any language. An English person would expand it as "two," a Frenchspeaker as "deux" and a Finnish speaker as "kaksi." Abbreviations, including less straightforwardly logographic ones, can sometimes too be expanded in several different languages. As Voigts (1989, 91) mentions, "[n]o educated reader is perplexed by e.g., i.e., or even viz., but it is by no means certain that even the latinate individual actually thinks exempli gratia, id est, or videlicet when he reads or writes those letters." There is some contemporary evidence that scribes could associate Latin abbreviations with the vernacular. The following examples are from three closely related copies of a plague treatise in medical manuscripts from the 1450s and 1460s (Honkapohja 2017, 136). The scribe of Sloane 3566 writes English "that is to say" where the other scribes use the Latin abbreviation s. for "scilicet." Art. 1, page 13 of 40 §22 A number of studies have uncovered evidence that this languageindependent quality was sometimes utilised in historical texts. Hector (1958, 37) mentions that English proper names in Latin documents could be "terminated by a mark of suspension to preserve the fiction that they were declinable Latin words." It has even been argued that the language-independent quality of abbreviations may have been used on purpose. §23 Abbreviated words that can be expanded in several languages are called visual diamorphs by Ter Horst and Stam (2018, 234), 3 who focus on Latin and Gaelic.
According to them, "[s]ome words are abbreviated ambiguously, and they can consequently be resolved as both Latin and Irish. One example is the title aps., which can stand for "apostle" in Latin, apostolus, or Irish: apstal" (Ter Horst and Stam 2018, 234). Wright has studied abbreviations in English/Anglo-Norman/French mixedlanguage documents (see e.g. Wright 2002Wright , 2011Wright and 2018. The author stresses the importance of abbreviations in the complicated contact situation during the late Middle English period, suggesting that the abbreviation system may be part of the reason, as it can be used to suppress the language-specific grammatical endings and highlight the stem. For example, a word such as argentꝰ can be read as Latin (e.g. argentem, argentis), Anglo-Norman French (argenté), or Middle English argent "(a) Silver, silver coin; (b) her. Silver-coloured, silver-gilt" (MED). Czajkowski (2018, 90) notes that abbreviated forms of pronouns can be used to suppress differences between High German and Low German forms of personal pronouns, as they can be expanded both to Low German "he," "we" and "unse" or High German "er," "wir" and "unsere." Consequently, abbreviations have an important function in languageindependent communication due to being logographic, and there is evidence that this quality was sometimes exploited in pre-modern multilingual texts. 3 The existence of language-independent elements has been noted for spoken communication in which they are referred to as homophonous diamorphs. Muysken (2000, 133) states that "when languages are similar or are perceived to be similar by bilingual speakers, switching is facilitated by specific including in short forms also shorter spellings (for example, do would be short form, but doe "do" or doo would be classified as long forms). They too note that shorter forms are used as an alternative for the longer ones. What all these authors agree upon is that the use of the shorter variant is subject to certain conditions. §26 Variation that is conditioned by the surrounding environment could be seen as analogous to what in phonology is referred to as conditioned variation.  -Yonah 1940, 9;Bozzolo et al. 1990;Cottereau 2005, 623), but it has the advantage of fitting the needed message into a smaller space, which could be important for reasons of conserving parchment or a carving surface of metal, stone or wood. Economy of time, on the other hand, was relevant throughout Antiquity and the Middle Ages, since "[t]he commonest way of committing words to writing was by dictating to a scribe" (Clanchy 1979, 97). 4 In addition to these two, some authors, such as MacLean (2002), whose focus is on Greek epigraphy, mention saving labour. As he puts it, "abbreviations were used as a means of reducing labor and saving space on the stone's surface" (2002,49). Whatever the reason, the question then becomes whether quantitative palaeography can be used to study whether abbreviation was conditioned by something we can measure. §29 Saving time may be somewhat difficult to establish hundreds of years later. It is impossible to measure how much time the scribe used to write a particular 4 Bozzolo et al. (1990,18) make a further distinction between economy of time and space necessary for the transcription of a text (Economiser sur le temps et l'espace nécessaires à la transcription d'un texte.) and for "augmenting the speed of transcription to try and match it to that of speech" (augmenter la vitesse de transcription pour tenter de l'adapter à celle de la parole) and third playing on the length of the word to fit it into constraints of the line or of the page (Jouer sur la longeur matérielle des mots, en function des limites de l'espace scriptable: c'est-à-dire, dans le livre, "les contraintes de fin de ligne" et les "contraintes de fin de page"). However, it could argued, that functions two and three broadly fit under economy of time and space respectively. For example, German Schryfftspiegel (1527) also lists abbreviations by syllables. It instructs that abbreviation should never be used for majuscule and for minuscule only "in need" ("in der not") at the end of the line (Römer 1997, 12-13). (Römer [13] also lists a number of other German early printed books which give instructions on the use of abbreviations.) §36 To sum up, abbreviated forms are alternative spellings for fuller forms that were used under certain conditions: saving space, time and effort. Time and effort are difficult to study with quantitative precision, but saving space is one of the easier things to quantify. It is something where digital approaches have much to add to the argument. Abbreviation can then be analysed quantitatively by tagging and using as variables the context of the position in the line, the preceding and following, position in the word, the manuscript page and the quire, and performing a statistical analysis. If such causes can be pointed to as the reason why a scribe used an abbreviation, we are dealing with conditioned variation. If no clear, immediate reason for using an abbreviation rather than a full form can be found, we must ask the question of whether there are any other reasons which might have led to its use.

Unconditioned variation §37
Economy of time and space do not, however, account for all mediaeval abbreviation. For example, Camps (2016, ccliv) notes that by carrying out a regressionusing model which takes the ends of quires, pages or lines into account uncovers a number of weak correlations but leaves much variation unexplained. To continue with the analogy to phonology, such variation could be called unconditioned. §38 Unconditioned or spontaneous variation in phonology refers to variation that cannot be attributed to the immediate phonetic context of the word. Cottereau-Gabillet 2016, which contains some of the results in English). She studied hundreds of manuscripts from Paris libraries, which were selected with the criterion that the name of the copyist is known. Using sociohistorical variables in a statistical enquiry, she found that abbreviation is less frequent in higher status copies. Higher abbreviation density can be predicted by the type of patron or higher "grade" of manuscript as more luxurious copies would have fewer. §42 While French and German studies have been much more comprehensive with respect to abbreviation, the English and Scots historical dialectological tradition can make use of corpora based on manuscripts localised based on their language.
These can be used to study abbreviations with respect to geographical variation.
Such studies include Smith (2018), who found that the abbreviation ꝭ "-is" is more common in more densely populated boroughs of Scotland: the scribes who copied more, were more likely to use it. Honkapohja (2019a and 2019b), on the other hand, discovered that there are a number of abbreviations specific to West Midlands counties in the Early Middle English period. §43 There may also be differences within writing traditions that could be examined. Hasenohr (2002,(82)(83) proposes that there is a major difference in monastic and scholastic abbreviation practices. Monastic writing was a slow and contemplative process. Scholastic abbreviation would include using more logographic abbreviation. Camps (2016, cclv) notes a number of interesting possibilities for this, proposing that abbreviations would peak with thirteenth-century scholasticism. §44 Abbreviations can also give instructions on how to pronounce words in a number of ways, it has been argued. For example, Hasenohr (2002, 92) discusses superscript abbreviations in French in connection to pronunciation and whether the words were single or polysyllabic in Latin and French. N.R. Ker cites a manuscript in which the Latin word neque "neither," which is expanded, has been consistently corrected to neq. He argues that the reason was stress in reading out aloud, as writing the word in full would mislead one to stressing the second syllable (cf. Clanchy 1979, 217;Ker 1960, 51). 5 Scribal profiles §45 One of the most important uses of abbreviations in palaeography is for dating and localising scribal hands; for example, Ludwig Traube (1902) noted that when he is looking into the date of a manuscript, he immediately turns to the abbreviations. To give a practical example, Ker (1960, 54) identifies one scribe as Norman, because he expands an Old English abbreviation (G) as "hoc" instead of "autem." This leads to the question of whether abbreviations could be used digitally for the identification of scribal stints using methods developed for stylometry, compiling a scribal profile. §46 There are pre-digital attempts to theorise and systematise the study of scribal accidentals. In the English tradition, this was proposed by McIntosh (1975), of abbreviations, as "tilde abbreviations seem more common in the first part of the copy in ms. A," while "B adopts a more abbreviation-rich orthography than A for this part of the letters." (Kestemont 2015, 172-173). The result was thus that even though abbreviations were not the predominant focus of the study, they emerged as the distinguishing factor. This result, along with the discoveries based on graphetic variation, proves that there is definitely potential in the use of abbreviation as means for digital scribal profiling. §48 A potential area of enquiry might be scribal profiles and high-frequency abbreviations. Kestemont (2015) noticed one scribe's propensity for using the very all-purpose tilde-abbreviation, which one scribe used much more than the other. Honkapohja (2019a: Figures 5-7) found that abbreviation in LAEME consists predominantly of certain types, especially the macron and hook. Camps (2016, cclvi) proposed that the best approach for studying similarities and differences between manuscript witnesses of the same text would be to focus on certain abbreviation types. A quantitative profiling of these high-function types is something which definitely deserves more investigation. §49 An alternative would be to focus on function word and content words.
Abbreviations seem to have been particularly frequent for function words. For example, Hasenohr (2002, 80) notes that most of these abbreviations were created in the twelfth and thirteenth centuries for use in cursive writing; they mainly affect the endings, the adverbs, particles and pronouns, as well as the forms of the verb esse -words "which come often under the pen" (qui reviennent souvent sous la plume). Rogos-Hebda (2018, 55) notes differences in the use of abbreviations by two Chaucer scribes. Honkapohja (2018, notes that scribes are more likely to copy lexical words directly, but have individual profiles for function words. A focus on function words has parallels with developments in stylometry in which function words were considered to be the best way of establishing authorship profiles, before n-grams proved to be more efficient (Kestemont 2014, 62). Nevertheless, with abbreviations the use of function words has not been compared to n-grams, which are also partly popular as they can be used with plain text (De Bruijn and Kestemont 2013, 182). Perhaps a rich transcription in which abbreviations are encoded could complement the study of scribal stints by n-grams. Or perhaps a transcription in which abbreviations are encoded could itself be subjected to n-gram based analysis.
6 Encoding abbreviations §50 In order to study abbreviations quantitatively, we need to count them somehow. While earlier studies collected their dataset manually, these days the standard is TEI P5 XML. TEI P5 is a flexible framework divided into modules, which provide a number of alternative ways to encode abbreviations. As we have now had a few decades' worth of projects using TEI XML, the problems, solutions and theoretical implications of dealing with different types of abbreviations have been discussed in many sources for different languages and document types (cf. Heiden et al. 2002;Driscoll 2006Driscoll , 2009Mazziotta 2008;Stutzmann 2010Stutzmann , 2014Speed Kjeldsen 2013, 34-38;Honkapohja 2013a;Horst and Stam 2018). Some of them decide to encode abbreviations in their resources and, as a by-product, publish an article or guidelines detailing their editorial choices. Moreover, individual projects and even individuals working within the same project may encode the same abbreviation differently. The aim of this paper is not to argue in favour of any particular encoding choices, but to function as a review article that will help readers to navigate these waters and also to outline how the theoretical contribution made in the previous section fits with two prominent encoding systems. However, one particular problem deserves highlighting as it is connected with the writing system cline discussed in section 3, as it offers a greater theoretical clarity on it. §51 The theoretical distinction outlined in section three fits in well with the discussion by Driscoll (2006Driscoll ( , 2009, which is influential not only in Scandinavian but also Anglophone scholarship. Some abbreviations replace a string of characters, while others are more logographic. The type of the abbreviation and whether it has a clear referent in a string of Latin alphabetical characters will affect the "editorial policy" on how it is best to be encoded. It is necessary to distinguish between abbreviations that refer to the entire word and ones that correspond to a sequence of characters. Driscoll (2006, 259) calls these abbreviations "with a graphemic reference (superscript letters and signs and the remainder of the brevigraphs)" and those "with a lexical reference (suspensions, contractions, and a number of brevigraphs)." He goes on the say that "[i]t strikes one as counterintuitive to treat the former on anything other than the whole-word level, while treating the latter in the same way seems equally misconceived." This distinction corresponds exactly with (attempted) phonographic and logographic writing systems (see Examples 1 and 2).

Honkapohja: Digital Approaches to Manuscript Abbreviations
Art. 1, page 24 of 40 §52 A different solution is proposed by Mazziotta (2008) and discussed by Stutzmann (2010 and2013). This approach advocates a different way of seeing abbreviations from the point of view of the writing system, focusing on the work of the scribe. The system seeks to model the scribe's strokes in a "descriptive" way based on their position in the "graphic space" without reference to the way they are vocalised in a phonological word (Mazziotta 2008, §13, §18). Mazziotta divides them into a few axes ( §19) and proposes terminology for describing various relationships (cf. Mazziotta 2008, §18- §49). For example, the very common "crossed-p" abbreviation ꝑ is not seen as a single sign, but rather the letter p modified by a cénégram (Stutzmann 2008, 265-66 problem of trying to fit into exact encoding boxes something which was a single stroke for the scribe. That is, in ambiguous cases, whether a horizontal bar which crosses through, e.g., a tall ascender for l but also simply sits on top of other letters, is a "macron" or a "crossed-l." Encoding based on these theoretical conceptions is discussed by Mazziotta (2008,  <ENTITY us> <w> <choice> <abbr>magn&#42863; </abbr> <expan>magn<ex>us</ex> </expan> </choice> </w> psychology of reading (2008, §46). ORIFLAMMS (2020) treats them in addition to various graphetic features. (Moreover, as Mazziotta admits his theoretical approach solves abbreviations from a theoretical point of view of strokes made by the scribe described in relation to other strokes. However, it is not fully compatible with the psychology of reading -as abbreviated forms become kind of logograms. Experienced readers are likely to read on word-level rather than individual graph level. Something which Hasenohr suggested as a transformation from the contemplative reading of monasteries to the logographic reading in universities). While there are advantages to graphetic transcription, such as being able to carry out statistical investigations without presupposing a writing system, the question of referent for abbreviations is more complex than for some other features and is of linguistic interest also from the point of view of where these features fit in the typology of writing systems from a graphemic point of view. §54 One further problem related to encoding abbreviations derives from the language-independence of especially some logograms (see section 3.2 above). Ter Horst and Stam also discuss encoding visual diamorphs in XML. Among other things, the @lang can be used to specify which language a certain abbreviated word belongs to. The mark-up for visual diamorphs is addressed by Ter Horst and Stam (2018), who add a tag for words that are part of both languages. "The preferred method for signalling code-switches in XML is the language-attribute (@lang=""). Apart from the standard value for Latin ("la") and Irish ("ga," for Gaelic) we added the custom value 'ga-la' for visual dimorphs" (Ter Horst and Stam 2018, 224). TEI Guidelines are thus well-suited for handling not only the logography/phonography variations, but also visual diamorphs. 7 Suitable resources for the study of abbreviations §55 Studying abbreviations quantitatively requires digital transcriptions that encode the forms of abbreviation as well as their expansions. While many earlier corpora were based on printed books, the situation is improving. The numbers are not nearly as high as corpora that do not encode abbreviations or manuscript images online (Robinson 2016, 182-7), but there are now several resources available that