Manuscripts, Qualitative Analysis and Features on Vectors An Attempt for a Synthesis of Conventional and Computational Methods in the Attribution of Late Medieval Anti-Heretical Treatises

Authorship and originality were tricky things in medieval literature and documents. They were written in a culture of imitation rather than originality. The Latin word auctoritas could mean both an author and their authority, usually both combined. Auctoritas had initially meant the quality by which a person can be trusted. Consequently, it came to mean the authoritative status of a person and further that of their writings.1 So, the ‘author’ was not just any writer whose texts were read, but the modern equivalent to their status would be something like that of Judith Butler in gender studies or Max Weber in sociology. Often, only writers meriting the status of auctoritas were explicitly cited, others

silently borrowed and, in modern terms, plagiarised. Thus, it is not uncommon to find late medieval theological treatises where long passages are copied from other high and late medieval works, but only patristic sources and the most important medieval theologians such as Bernard of Clairvaux are named. As the 'authorship' in medieval discourse was more related to responsibility over content than style or form, we can find very original literary works written under the term 'compilation' . 2 Some of these compilations circulated under the name of an authoritative figure, such as Augustine, and were generally considered to convey his thoughts, even if the actual content contained very little of his original works. Furthermore, scribes and secretaries were often employed in the actual composition of the final work, creating a further layer of stylistic authorship in a text. 3 Due to these characteristics, the scholarship of medieval literature has for a long time recognised that the role of a compiler and even copyist was often comparable and at times surpassed that of an auctor. 4 Finally, many texts circulated anonymously or under an early modern misattribution.
In this chapter, we discuss one such complicated case of medieval authorship, an anti-heretical treatise known as the Refutatio errorum, written in the 1390s in German-speaking Europe. It has many of the characteristics described above. It is of compilatory nature, containing passages from different sources, very few of which are named in the text itself. It has no prologue or comparable section, where someone would claim their authorship over the text. Instead, the whole treatise is very practical, intended to provide information, but not to flaunt with rhetorical abilities of its composer. For a long time, the treatise was considered anonymous, until R. Välimäki provided contentual, structural and codicological evidence linking the treatise to the inquisitor Petrus Zwicker, who also authored a more famous anti-heretical work entitled Cum dormirent homines. 5 As is usually the case with reattribution of a medieval work, the conclusions are not based on single evidence, but on a combination of mutually enforcing pieces of evidence (described below). The purpose of this chapter is to add a new element to the analysis: computational authorship attribution using a Support Vector Machine (SVM). We discuss the results of the computer classification in relation to qualitative analysis of the text. The aim is to find out if computational methods provide added value to conventional authorship attribution of a medieval text. Or, could one claim that the computational methods are to be regarded as superior to qualitative interpretation by an expert human reader?
Computational authorship attribution can be considered a sub-category of style-based document authentication (Echtheitskritik), 6 and the first attempts to apply computational methods in the attribution tasks of Latin literature were in the 1970s. 7 After that, there was a lull in computational study of classical and medieval literature with a few exceptions, 8 but since the late 1990s and especially in the past five years several studies have demonstrated that computational authorship attribution can be a powerful tool in the recognition of classical and medieval authors. 9 As perhaps symptomatic to the whole field of digital humanities, the first publications of this new wave of computational authorship studies have concentrated on developing the methodology itself, and results have been published mainly in digital humanities journals. At the same time, the attribution of new texts to classical and medieval authors goes on with little regard to the results of computational stylistics, 10 and some recent publications even claim that statistical stylometry has fallen out of favour. 11 Although such a claim betrays lacking knowledge of one's research field, digital humanities scholars are not entirely blameless. Very few publications have tried to bridge the gap between discussions on authorship in the fields of literature studies and history, and in the computational linguistics respectively. Remarkable exceptions are Jeroen de Gussem's recent article on trails of Nicholas of Montiéramey's secretarial style in Bernard of Clairvaux's writings, as well as Mike Kestemont and colleagues' study on collaborative authorship of Hildegard of Bingen and Guibert of Gembloux. 12 Consequently, computational analysis can raise suspicion among humanities scholars trained in qualitative methods. Machine learning or other branches of computational text classification may appear as radically new ways of analysing sources that bypass the human expertise (and are therefore terrifying). This they, however, are not. Although utilising computational capacity and handling amounts of data that far surpass the abilities of any human individual, the computational authorship attribution uses stylistic features that have been long since recognised as marks of authorship. A. Mutzenbecher prepared a new edition of Maximus of Turin's sermons at the beginning of the 1960s and defined 16 criteria (some with several sub-categories), which he divided into four slightly overlapping groups: (1) external evidence, (2) biblical quotations and their exposition, (3) style and (4) sources. Some of his criteria were primary, some secondary. An authentic sermon had to fulfil two primary criteria and several secondary criteria. 13 For the purposes of this chapter, it is not necessary to explain what all of these were. It is sufficient to note that many of Mutzenbecher's criteria were purely qualitative, such as the theological topics Maximus typically discussed, but especially criteria for the introduction and exposition of the biblical citations (numbers 6 to 8) and criterion 13, linguistic-stylistic characteristics, include features that are similar to stylistic features used in computational authorship attribution: word uni-and bigrams formed of function words and other very common expressions (for example, enim, ex quo, hoc est, quanto magis, sed dicit, ego dico, mirum est). 14 Mutzenbecher was well aware that these stylistic features appeared in almost all other authors in addition to Maximus, and that none of them could individually constitute authorship, 'but if several of them support each other reciprocally, their relationship might express something typical' . 15 Computational authorship attribution does precisely that: it uses features that appear in almost all authors, but with different emphasis. To put it simply: it is the combination of all the significant stylistic features in comparison to their combination in other authors that determines authorship. A computer, however, is not limited to a few obvious stylistic features of an author, but can handle thousands and millions of these in a systematic and repeatable way.

The Refutatio Errorum and Its Redactions
The test case in this study is a text known as the Refutatio errorum. It is a polemical description of the Waldensians, a religious group persecuted as heretics by the Catholic Church in the Middle Ages and early modern period. In the 1390s, a series of inquisitions and other trials were directed against the group in German-speaking Europe, 16 and the Refutatio was written as part of the literary polemics accompanying the persecution. The treatise gives a view of Waldensianism very similar to that of the better known polemical treatise against the Waldensians, Cum dormirent homines (henceforth, CDH), written by one of the most important inquisitors of the late 14th century, the Celestine provincial Petrus Zwicker. The Refutatio is clearly a representative of the same era and state of knowledge about the Waldensians as the CDH. It has been commented on by scholars much less than the CDH, quite likely because the only available printed version, edited by Jacob Gretser together with the CDH (1613/1677), is obviously incomplete. It has 10 chapters, but the text stops abruptly in the middle of the tenth chapter. 17 Among the scholars, there has been confusion rather than actual disagreement about the Refutatio's authorship. For a long time, everyone was reluctant to make definite claims about its authors. In his groundbreaking studies on the CDH, P. Biller did not suggest any author or dating for the Refutatio, but seems to have held the view that the two treatises were not written by the same author, that is Zwicker. In fact, Biller uses the common manuscript tradition of the Refutatio and CDH as an argument against the attribution of the CDH to Peter von Pillichsdorf, the author suggested by Gretser in his 17th-century edition. The argument runs as follows: Gretser's misattribution was based on the now lost Tegernsee manuscript, which included the CDH and a short anti-Waldensian treatise by Pillichsdorf, who is the only author mentioned in the manuscript. This consequently led Gretser to propose Pillichsdorf as the author of both these treatises treating the same topic. According to Biller, this is a parallel case to that of the several manuscripts, including the CDH and the Refutatio. These too were two different treatises on the same subject, but were treated as one by both medieval scribes and modern compilers of manuscript catalogues. Biller did not state anything explicit concerning the authorship of the Refutatio, calling it and Zwicker's CDH only 'two tracts on similar material' . 18 They do indeed cover very much the same material, and because of this P. Segl has tentatively proposed that these two treatises originated from the same hand. 19 E. Cameron describes the treatise very vaguely, but evidently treats it as a product of the 1390s, at one point calling it 'a third treatise from Zwicker's circle' . 20 A. Patschovsky has also associated the Refutatio loosely with Zwicker, without making any definite claims about its authorship. 21 In other words, there has been a vague suspicion that Petrus Zwicker, or someone close to him, wrote the treatise.
To further complicate the study of this text, the only available printed editions are based on a text that is anything but representative of the manuscript tradition of the Refutatio. As noted, Jacob Gretser printed the tract in the 17th century from a manuscript that ends abruptly in the middle of Chapter 10. I. von Döllinger's 19th-century edition from the same manuscripts does not help, but adds further confusion, as the order of the chapters is mixed in the edition, and material not belonging to the Refutatio is inserted among the text. 22 An analysis of all the preserved 19 manuscripts of the work by Välimäki has demonstrated that the edited version of the texts does not concur with the main manuscript tradition, that is the most common and widely circulated medieval text. All in all, Välimäki found four different redactions of the Refutatio errorum. Of these, Redaction 1 is by far the most common, with 13 manuscripts. It is also the only redaction accompanying Zwicker's better known and more popular treatise, the Cum dormirent homines. The two texts appear together in eight manuscripts. In comparison, the text printed by Gretser in the 17th century is a late and incomplete redaction (Välimäki's Redaction 4) represented by only two medieval manuscripts. 23 In addition to collation of the Refutatio's manuscript tradition, Välimäki has also proposed that the treatise can be attributed to Petrus Zwicker. The two works present a very similar view on the Waldensians; they both follow similar structure of polemical refutation by presenting heretical propositions and Catholic counter-arguments, mainly based on biblical quotations. The most important pieces of evidence for the common authorship are the sources cited in these two works. In the CDH, Zwicker quotes almost exclusively the Bible in support of his arguments. The single exception to the rule is a reference to Boethius' Consolation of Philosophy. The same quote can be found in almost the exact same form in the Refutatio errorum. In addition, the author of the Refutatio had direct access to Moneta of Cremona's 13th-century anti-heretical treatise Adversus Catharos et Valdenses. The treatise was very rare north of the Alps, but Petrus Zwicker used it when composing the CDH. The final rare source implying the authorship of Zwicker is a misquotation of Ezekiel 33.12 in the Refutatio. The exact form of this quotation comes from the legal consultations on the case against the goldsmith Heynuš Lugner in the late 1330s or early 1340s, transmitted in two manuscripts, a Bohemian inquisitor's manual Linz MS 177, and another, St. Florian, MS XI 234, which is copied from the first manuscript. The Linz manual was once owned by Petrus Zwicker and the St. Florian manual was copied from his own inquisitor's manual. Ergo, the author of the Refutatio had access to a rare text, which has certain manuscript circulation only in connection to Petrus Zwicker. 24

Texts for the Analysis and Pre-processing
Next, we analyse the two most important redactions of the Refutatio with computational classification in order to verify Zwicker's authorship. The redactions selected for the classification are the most common and longest Redaction 1 and Redaction 4 representing the version in Gretser's edition. The text of Redaction 4 is taken from a manuscript Augsburg, Universtitätsbiliothek MS 338 (TEST1) as well as Gretser's edition (TEST2). Redaction 1 is transcribed from the manuscript Vienna, Österreichische Nationalbibliothek (ÖNB), MS 1588 (TEST3). All texts are long enough for a reliable authorship attribution, from around 5,500 words in TEST2 to over 9,000 words in TEST3. We excluded Redactions 2 and 3, both extant in a single manuscript and not close to the original text. Neither of these redactions is representative of the medieval or modern reception of the work.
We trained the classifier with Petrus Zwicker's CDH (around 23,000 words). The text we used comes from the same Gretser's edition as one of the tested versions of the Refutatio. The reference corpus for training our classifier consisted of late ancient and medieval anti-heretical polemical treatises, which is the genre of both Zwicker's CDH and the Refutatio. In total, this training data has around 600,000 words. The emphasis is on medieval texts, and the corpus includes three works that are almost contemporary to Zwicker's texts: Wasmud von Homburg's Tractatus contra hereticos, an anonymous Attendite a falsis prophetis and the already mentioned Peter von Pillichsdorf 's Contra Pauperes de Lugduno. In addition, the most important source and stylistic model for Zwicker's CDH, Moneta of Cremona's Adversus Catharos et Valdenses, is included. From Moneta's very long treatise, we selected only Book 5, where many of the anti-Waldensian arguments are presented. Alone, Book 5 has over 120,000 words, and including all 400,000 words from the whole treatise would have created an imbalanced reference corpus. The complete corpus with bibliographical information is in Appendix 16.1. The data is available at our GitHub page in masked form only to protect copyrights of recent editions used in the corpus. 25 The dataset we use is far from easy and common in authorship attribution tasks. It is a mixed corpus of different edition and transcription standards, which is a problem for feature selection. Even though character n-grams are widely used as features in text classification, recent computational studies on the authorship of classical and medieval texts have preferred lemma-level approach and function word analysis over character n-grams or plain text. 26 This is partly due to the orthographical variation in medieval Latin. The effects of orthographical variation are more marked when the features used are a few dozen function words. However, as our classifications are based on a much more complex set of features, the effect of single 'bad' features for the end result is minimal. Using word uni-and bi-grams from plain text, as well as character n-grams, also has significant benefits in Latin. It gives access to stylistic solutions below the word level, such as the author's decision to use the subjunctive instead of the indicative. 27 We solved the most common issues of different editorial principles and orthographical variation with simple normalisation rules: oe → e char → car (to solve variation charitas vs. caritas) wa → va (to solve variation ewangelium / evangelium and waldenses / valdenses) These solve the majority of orthographical variation caused by editorial and scribal conventions and the differences of medieval and classical Latin without masking potentially significant stylistic features. In addition to orthographical normalisation, in the pre-processing phase we cleaned the texts from editorial additions such as page numbers and chapter titles (unless part of the original). Punctuation, numerals and single characters were removed. From early medieval texts, we naturally cleaned the references to bible books and verses (which were added by later editors), but in late medieval texts, most notably Zwicker's own treatise, these are part of the original and were thus preserved. The preprocessing was done automatically, but confirmed with sanity checks.
However, the transcripts from medieval manuscripts have much more variation than edited texts. While orthographical habits and grammatical mistakes of individuals are excellent stylistic features when one is dealing with autographs, in medieval manuscript culture such variation is noise in data. We are usually not interested in writing conventions of an individual scribe, but those of the author or compiler of the work. Even the usual orthographical variation of late medieval manuscripts is challenging to normalise without also masking potentially significant stylistic features. 28 Thus, in addition to solving the question of Zwicker's authorship, we experimented with the data in order to find a relatively effortless way to pre-process and analyse such a corpus with a computer. The expected results from our dataset are as follows: 1. If the pre-processing and feature selection are able to overcome the orthographical challenges, all test cases of the Refutatio should be classified in a similar way. We expect that they are classified as Zwicker's works together with the CDH (values over 0). 2. All other works should get values below 0 in the classification. 3. If Peter von Pillichsdorf 's treatise from Gretser's edition is classified together with Zwicker's works, the early modern editorial solutions have more weight more than medieval authorship.

Computational Authorship Attribution: Methods
The puzzle we set out to solve is: Did Petrus Zwicker write the Refutatio errorum? In authorship attribution, this is called a verification problem: we do not have a closed set of candidates, but one suspected author. 29 We constructed the verification problem as a simple binary classification, where Zwicker's treatise forms one class and all other authors in the training material a second class. The classifier was trained with this material, and the versions of the Refutatio were presented as a test case. We use the two corpora combined as training data for the classifier, while the test cases form the test data. The different redactions of the Refutatio are each treated as a separate test case.
Here, we present an overview of the methods. For technical details and code, please consult our project repository. 30 For the classification, we use a linear SVM, which is a simple yet effective classifier and has traditionally been applied in text classification tasks. 31 The SVM works by learning a weight for every feature from the training data, so as to maximise the decision margin between the two classes. The weight being positive or negative is an indication of which class the feature is potentially associated with, although one needs to exercise caution when comparing features in isolation based on their weight. The features we use with the SVM are word unigrams and bigrams. In other words, we train the classifier with the training data to recognise the features typical and atypical of Petrus Zwicker's style. After that, the test cases are classified, and the output is a value indicating how much (positive) or how little (negative) the sum of weighted features in each test case resembles the class (Zwicker). The values are represented on a scale between -1 and 1.
The value and the decision are largely useless in isolation if we cannot be certain that the classifications are valid overall. Here, we apply the standard technique of cross-validation using the training data, which provides us with an estimate of the classification accuracy and therefore the reliability of our results on the actual test documents. 32 The classifier we use is by nature undiscriminating when it comes to the features. It does not care which features are used, as long as they increase the training accuracy. In authorship attribution tasks, this would ideally be features that describe the author's way of writing, such as the usage of function words. Even within a single genre as in our training data, however, the particular topic of each text affects the results. We run the classification to unmasked data, and among the 10 strongest positive features five included 'Waldensians' in some form. 33 A classification from such features is based only partly on an author's style, and the topic of the texts heavily distorts the results. Therefore, we must mask topic words so as to not let the classifier focus purely on the topic of the texts instead of the author's style. To this end, we calculated the thousand most common words in post-classical (Christian) Latin. 34 Any word not in the calculated word list will be masked. This has been shown to drastically increase the accuracy of cross-genre classifications, as it forces the classifier to learn author-specific rather than topic-specific features. 35 This method does not completely remove topic words, but it only leaves the ones that appear regularly across different genres. In the following, we concentrate on results from the classification in the masked data.

Comparing the Results
The classification from the SVM using masked data is presented in Table 16.1 and in Figure 16.1.
The results were both expected and unexpected. First of all, the classification confirms that also from a stylistic perspective Petrus Zwicker is the author of the Refutatio errorum. All redactions, whether transcripts from manuscripts or the text from Gretser's edition, were classified as Zwicker's texts with a clear margin to other works. The exception here is the short treatise Attendite a falsis prophetis, discussed below. But if we exclude it, all other works from the reference corpus got values below 0, and Zwicker's texts were neatly classified between 0.662 and 1.0. Not surprisingly, the text from Gretser's edition got the highest value (1.0), in fact higher than the CDH. This appears contradictory at first, but the explanation is simple: the classifier first learns the weight of features from the whole text, but in cross-validation the text is divided into slices of 1,000 words, and the final value is the average of all the slices. Some of these got values below 1, weighting down the average. In other words, the Refutatio's style is indistinguishable from Zwicker's style in the Cum dormirent homines in comparison to the reference corpus. After pre-processing and masking, the features on which the SVM bases its decision pass the sanity check. In Appendix 16.2, there is a list of the 50 strongest positive and negative features. In both positive and negative class, these are function words or common content words, or bi-grams combining such common words with masked words. Among these, only one positive feature ('imo' 6.344) results from orthographical variation (imo vs. immo). All in all, a classification based on these features can be deemed reliable and nondependable from topics.
The classifier was also able to distinguish authorial signature from both editions and manuscripts so that the editorial solutions or orthographical variation do not completely distort an author's style. This is confirmed not only by the consistent classification of the different versions of the Refutatio, but also by the value acquired by Peter von Pillichsdorf 's Contra Pauperes de Ludguno. Despite being a tract on the same topic (Waldensians) as the CDH and the Refutatio, and from the same edition (Gretser) as the CDH and TEST2, it got a clearly negative value of -0.574. Six other texts got values nearer to the threshold, so Pillichsdorf 's tract is very far from Zwicker's texts. The edition, of course, has an effect, as we can see from the very strong value TEST2 got. The unexpected result was the Attendite a falsis prophetis. It got a very high value (0.953), and in the classification in the unmasked data, not presented here in detail, the result was consistent (0.559). This cannot be explained by the same topic, as the extremely high value is based on masked data. How should we interpret this? Do we have a new text attributed to Petrus Zwicker? This is a possibility, but the SVM's classification must be considered against the historical context, manuscript tradition and the contents of the text.
First, very little in the contents of the text contradicts Zwicker's views in the Refutatio or the CDH. In fact, the Attendite presents similar Waldensian propositions and Catholic counter-arguments to those of Zwicker. For example, the CDH, Refutatio and Attendite all begin by refuting the Waldensian claim of a legitimate lay ministry and proceed then to treat individual points of doctrine such as denial of Purgatory and oath-taking. P. Biller has already pointed out a certain similarity between the Attendite and the CDH. 36 There is a minor detail: the Attendite states that the Waldensians do not accept the books of Maccabees as parts of the biblical canon. 37 In the CDH, Zwicker stays silent about this and in fact uses the Maccabees to prove that the intercession on behalf of the dead had its foundation in the Bible. 38 This small divergence, however, can be explained by the development of Zwicker's argumentation. He desperately needed the Maccabees in order to maintain the principle of finding the foundation of Catholic doctrine and practices solely in the Bible, a principle that was only fully developed in his main work, the CDH. The author of the Attendite did not follow these guidelines: some of the arguments are supported by patristic quotes. Yet, this does not automatically deny Zwicker's authorship.
Although Zwicker got rid of extra-biblical quotes almost completely in writing the CDH, he refers to patristic auctoritates several times in the Refutatio. 39 Solely based on the contents, the Attendite could be an early work of Petrus Zwicker. He was, after all, a man obsessed about the Waldensians and the threat they posed to the Church, and it is not out of the question that he wrote a third treatise against them.
The main doubt comes from the dating of the work. This is remarkably difficult, because the text is very general and does not refer to any specific persons or incidents. Nor does the author use any particular or rare sources. In principle, any late medieval author with access to anti-heretical treatises commonly circulating in Central Europe could have written the text. There have been two propositions about the author, one obviously mistaken, and another probably due to confusion with another text. Based on one manuscript (Wrocław, University library MS I F 230), R. Cegna misdated the text to the year 1399 and misattributed it to the Silesian inquisitor Johannes of Gliwice. 40 There are no grounds whatsoever for either the dating or the attribution, 41 and a few manuscripts predate the one used by Cegna. Older research attributes the treatise to the Bohemian reform preacher and troublemaker Conrad of Waldhausen, which would date the text to the 1360s. 42 The attribution might have resulted from confusion of this short tract with Conrad's sermon on the same Bible verse, given at some point in 1363 to 1369. 43 The manuscript transmission history points to Austria, Southern Germany, Bohemia and Silesia. P. Biller has proposed that the earliest possible dated manuscript of the Attendite is St. Paul im Lavanttal, MS 71/4, which has the year 1373 at folio 160va, referring to the composition date of a copy of a polemical letter from a converted Austrian Waldensian to Lombardian Waldensian Brethren. 44 Although the part with the Attendite (folios 144ra-146vb) belongs to the same fascicule with the letter, it is uncertain if 1373 is the production date of this particular exemplar. The manuscript MS 71/4 is a compilation with fascicules produced at different times in the late 14th and early 15th centuries. 45 The dating can only be confirmed through codicological analysis of the physical object itself, which is not possible within this study. The more secure dating comes from Klosterneuburg, MS CC 826, datable to 1391 and described by P. Biller. 46 With absence of a systematic study on the manuscript circulation of the Attendite, this is the most credible terminus ante quem. It means that the geographical distribution and dating of the manuscripts overlaps with the beginning of Petrus Zwicker's career as inquisitor of heresy, not excluding his authorship.
The final caveat comes from the credibility of the attribution itself. The text is only around 2,500 words long, making the attribution unreliable, as we are dealing with data with noise. In addition, the Attendite and the CDH (which is the material we used to train the computer for the class Zwicker) quote the same Bible verses. Although the quotations are not word-to-word identical, there is shared material in these two works. In the attribution of such a short text, it necessarily has an impact. Finally, we used a version of the text from a single manuscript, which we had in machine-readable format. There is a critical edition of the text by R. Cegna, but it too is mainly based on a single manuscript with variant readings in endnotes. 47 The final attribution of the Attendite is only possible when further study reveals the earliest redaction of the text and the manuscript dates are confirmed. From the earlier proposed authors, texts from Conrad of Waldhausen must be included in the classification as a possible author. At this point, we must be content to say that the Attendite a falsis prophetis is possibly attributable to Petrus Zwicker, but the attribution needs corroborating evidence from the manuscript tradition.

Conclusion: Additional Value of the Computational Analysis?
In the future, the computational authorship attribution should be taken into the toolbox of historians and philologists, who work with anonymous, pseudo and dubious texts. The classifiers developed for the analysis of modern literature or forensic purposes have been proved to be effective also in the study of ancient and medieval texts.
In our case study, the authorship of the Refutatio errorum, the computational methods produced both corroborating evidence and expected results, as well as radically new insights. The authorship of the Refutatio was confirmed as Petrus Zwicker through computational stylistics. Although there were previous, convincing pieces of evidence in support of this, the analysis is not without added value. A computer's decision is based on a completely different set of features than contents analysis and contextual evidence presented in the previous studies. Another important result was the classification of Peter von Pillichsdorf 's treatise as clearly non-Zwicker. This not only confirms the earlier qualitative attribution, but demonstrates that our classifier can bypass the stylistic conventions of an early modern editor and detect the medieval author signature below.
The greatest added value of computational authorship attributions comes, however, from the unexpected results, from texts behaving in an anomalous way. In this classification, the Attendite a falsis prophetis did precisely this. Up until this point, nobody has really considered Zwicker's authorship, because the manuscript tradition points to a somewhat earlier treatise. Yet, when the classification gave a strong attribution to Zwicker, it forced us to reconsider the qualitative evidence. This, in turn, was revealed to be indecisive as well. Although we are not ready to declare the case closed and a new text attributed to Zwicker, the example demonstrates the true power of computational methods: it breaks the existing patterns of thought and demands re-evaluation of previous presuppositions.
Our chapter demonstrates that computational history cannot progress in isolation from the more conventional study of history, particularly the very basic archival study of sources. The attribution of the Attendite a falsis prophetis is to remain ambiguous until the existing manuscripts are surveyed in detail. The study of history depends on source criticism, and in order to date, attribute and localise sources with digital methods we have to take care that our metadata is up to standard.  1988: 192-193, 196