Narrator language and character language in Thucydides: A quantitative study of narrative perspective

...........


Introduction
Narrative perspective is the fascinating, but poorly understood in linguistic terms, phenomenon whereby literary texts often present events through the eyes, or minds, of a character in the story.(1), from Jane Austen's Emma, for example, is presented from within Emma's consciousness: (1) The hair was curled, and the maid sent away, and Emma sat down to think and to be miserable.-It was a wretched business, indeed!-Such an overthrow of everything she had been Digital Scholarship in the Humanities, Vol.0, No. 0, 2019.wishing for.-Such a development of everything most unwelcome!-Such a blow for Harriet! -That was the worst of all.Austen, Emma, cited in Eckardt (2015) Thucydides, an Ancient Greek historiographer who recounts the Peleponnesian War between Athens and Sparta that took place in the fifth century BC, was already in Antiquity famous for his ability to create in the readers the emotions of the characters.
Plutarch tells us about him: Assuredly Thucydides is always striving for this vividness in his writing, since it is his desire to make the reader a spectator, as it were, and to produce vividly in the minds of those who peruse his narrative the emotions of amazement and consternation which were experienced by those who beheld them.Plutarch De Gloria 3, translation by Babbit (Plutarch, 1936) Today, Thucydides is still highly esteemed for his dramatic uses of perspective shifts (e.g.Allan, 2018).His ability to create such shifts is all the more fascinating if one realizes that a more or less fixed device to shift to a character's perspective such as Free Indirect Discourse (FID) in modern literature (as also used in (1)-Jane Austen was indeed one of the pioneers of this technique) was not yet available (Bary and Maier, 2014).His manipulation of perspective is perceived as very subtle and nuanced and classicists up to today have tried to get a grip on it.
(Recent studies include Bakker (1997), Grethlein (2013a,b), Allan (2018).)We provide a new quantitative approach to this question, exploiting corpus techniques, such as memory-based learning and simple automated word counts and (sub)corpus frequency comparisons.Not only classical scholars but also theoretical linguists and narratologists try to understand how narrative perspective is grounded in the language used (see Banfield, 1982;Ginsburg, 1982;Fludernik, 1993;Sanders, 1994;Dancygier, 2012;Eckardt, 2015, to mention but a few important studies).Semantic research on this phenomenon has so far mainly focused on grammatical features (such as tense) and function words (pronouns and particles) (see e.g.Schlenker, 2004;Sharvit, 2008;Eckardt, 2012;Maier, 2015).Content words, such as wretched or overthrow in (1), have received much less attention in semantic research on this topic (but see the discussions of epithets and expressives in FID in Harris and Potts (2009) and Kaiser (2015)).The situation is different in narratological research, where the role of content words has received considerable attention, for example, in de Jong's (2001 and2004) seminal narratological studies on narrator versus character language in Homer.De Jong's 'character language' comprises expressions that appear predominantly in speeches and rarely in narrator text (outside speeches).They are typically charged, evaluative or emotive, words.Their scarcity outside speeches contributes to the impression of Homer's narrator as impersonal and objective.When words from character language do-infrequently-appear in narrator text, they still are, de Jong argues, to be interpreted as conveying a character's perspective, and they are essential to the creation of certain narrative effects.This may happen in indirect discourse, as ἀλείτης 'the sinner' in (2), but also outside speech representation, as ἐνηής 'gentle' in (3): (2) φάτο γὰρ τίσεσθαι ἀλείτην 'he (Menelaos) thought to himself that he would take revenge upon the sinner' Homer, Iliad 3.28 (3) κλαίοντες δ' ἑτάροιο ἑνηέος ὀστέα λευκὰ ἄλλεγον 'weeping, they (the Greeks) collected the white bones of their gentle companion Patroclus' Homer, Iliad 23.252-3 Both ἀλείτης 'the sinner' and ἐνηής 'gentle' are to be evaluated from the character's rather than narrator's perspective, de Jong (2004) argues on the basis of their distribution in the text.We assume, following the narratologists' lead, that an important role in the manipulation of narrative perspective is played by the choice of content words, and we try to see how this works.Inspired by de Jong we aim to contribute to the understanding of the role of content words in narrative perspective by investigating character language in Thucydides.
Our hypothesis is that this specific distribution will indicate content words that are somehow important to narrative perspective.The availability of Ancient Greek digital texts makes it presently possible to proceed in a highly automated way.It allows us to let the text speak for itself to a larger extent: rather than deciding first which words are evaluative and emotive and then counting their occurrences in narrator and character text (as de Jong did), we identify character language by looking at the relative frequencies of all words within and outside character speeches and identifying those whose distribution is the most skewed.They need not correspond only to highly charged vocabulary that a narratologist conducting a manual analysis would identify beforehand.This procedure makes it possible to uncover even subtler perspectival effects achieved by the narrator with the use of expressions that could intuitively seem entirely 'descriptive' or non-perspective-sensitive. Such an approach is especially important for Ancient Greek texts, where the impossibility of sampling native speaker judgments makes it difficult to capture the finest nuances of meaning and use.Once the salient words are known, we can then ask why those and not others are associated with character language in Thucydides, a question which, as we will suggest, is also interesting from a linguistic perspective.As such this study addresses a narratological question, using quantitative corpus-based methods, and hinting at its linguistic consequences.
In this article we first present our pipeline (Section 2): What did we do to get from the plain text to two subcorpora of narrator and character text that are lemmatized (annotated with their dictionary word forms)?Then we explain in Section 3 how we ranked the lemmas, identifying the ones that occur most significantly more often in character text than in narrator text.In Section 4, we look into the passages we are interested in: passages where the narrator seems to speak in the default impersonal mode, but where we find words that very strongly exhibit the pattern of character language expressions (top ten) as analyzed in Section 3. In Section 5, we go a step further and identify chapters in narrator text where character language expressions (based on a larger set of 125 characteristic lemmas) cluster, as these are expected to be highly interesting from a narratological perspective and worth investigating further.As an illustration, we discuss in some detail four narrator text passages with especially high density of character language.In conclusion, we discuss how our study not only contributes to the understanding of Thucydides' literary craft, but also suggests new avenues of research in narratology and linguistics.

Data preprocessing pipeline
In this section we explain how we preprocessed the text to be able to identify character language in Thucydides.
Text: The text of Thucydides' History of the Peleponnesian War that we used is the Oxford Thucydides edition by H. S. Jones and was provided in digital form by Perseus Digital Library (Crane, 2016).It counts about 150k words in total.
Encoding: As the text from Perseus was in betacode (an ASCII-only method of representing Ancient Greek characters, which was the standard for a long time), we converted it into unicode (utf8) using a converter created by the Classical Language Toolkit (Johnson et al., 2014(Johnson et al., -2016)), since unicode (containing many more characters) is currently the standard.(Actually, even Perseus has switched to unicode in the meantime.) 'Sub-corpora' of narrator and character text: We worked on a text divided into separate files corresponding to single chapters in the original edition.These we divided it into three sets: character text (CT), quoted text (QT), and narrator text (NT).CT comprises all the speeches presented by Thucydides in direct discourse.Importantly, most of the speeches in modern editions begin and end on chapter boundaries and so we could simply move the appropriate files to the CT set-but not all, as some are quite short and comprise only part of a chapter.In such cases we split the file into two accordingly, putting the direct discourse fragments in the CT and the main narrator's text in the NT.We did the same with QT, which comprises all the direct discourse passages in the text that are (or purport to be) actual quotations from existing documents-poetry, inscriptions, peace treaties, etc. 1 We decided to remove those passages (henceforth we ignored QT entirely) because insofar as they are quotations from actually existing sources, Thucydides as author cannot be taken to be responsible for the lexical and stylistical choices made in them.NT comprises all the rest of the text (importantly, this includes speeches presented in indirect discourse, which could not be automatically separated as no appropriate tagging or annotation exists for the whole text 2 ).
The direct discourse passages were already annotated with <q> </q>(for 'quotation') tags in the Perseus xml files, so they could be largely extracted automatically (with manual checks and corrections).However, the division into CT and QT had to be done manually based on our own judgments.QT contains ca.4k words, and was not taken into account in any calculations reported here.NT has ca.115k words, while CT ca.31k words.
By splitting the text into single chapters and then separating them into two subsets, we have effectively transformed our one text into a structured corpus, treating CT and NT as its two subcorpora, and the chapter files as individual documents.This is an important step, because it allows us to apply corpus methods to it, comparing word frequencies between (sub)corpora and between documents.
Lemmatization: Since we wanted to count occurrences of words at the level of the lemma (dictionary word form) rather than at the level of the actual (inflected) words themselves, we developed a lemmatizer for Ancient Greek: a program that assigns a lemma to each word in context.As far as we know, the lemmatizer GLEM, which we created in the Perspective project in collaboration with Iris Hendrickx and Peter Berck, is the first publicly available lemmatizer for Ancient Greek that uses part-of-speech (POS) information to disambiguate between multiple lemmas and that also assigns output to unseen words, words that are not yet in the lexicon, using a trainable memory-based learning component. 3 GLEM is a combination of lexicon look-up and memory-based learning.Memory-based learning is a technique proposed to learn Natural Language Processing (NLP) classification problems.Its defining characteristic is that it stores in memory all available instances of a task, and that it extrapolates from the most similar instances in memory to deal with unseen cases (Daelemans and van den Bosch, 2010).We found that for Ancient Greek this combination outperforms lemmatizers that use only one of these components.Its memory-based learning component makes use of Frog (van den Bosch et al., 2007, Hendrickx et al., 2016), an integration of open source memory-based NLP modules originally developed for Dutch, but trainable for any language if there is a corpus available where each word is labeled with its appropriate POS tag and lemma.
PROIEL's annotated Herodotus text provided us with such a corpus for Ancient Greek (Haug andJøhndal, 2008, Haug et al., 2009), which we used to train GLEM on.Regarding the look-up component, we used as our lexicon a merge of the Perseus and PROIEL lexicons.For details about this merge and about the division of labor between the lookup component and the memory-based learning component, the reader is referred to Bary et al. (2017).GLEM was trained on Herodotus and achieved an accuracy of 95.7% on that text in a ten-fold cross-validation experiment.To compute the accuracy of the same lemmatizer on Thucydides, a text completely independent of the training material, we had to manually annotate a short sample text ourselves.We took the first 15 chapters from book 1 of Thucydides and manually annotated this sample with POS tags and lemmas.This sample contains 2,652 tokens, 56 sentences, and 314 different POS tags.A simple comparison between the lemmas predicted by GLEM and our manual annotations revealed an accuracy of 93.0% for Thucydides.We then applied GLEM to the two text parts, CT and NT, creating two lemmatized subcorpora.

Ranking Lemmas
In this section, we report on how we identified character language on the basis of the lemmatized subcorpora of narrator and character text presented in the previous section.
We counted the occurrences of each lemma in either of the subcorpora and calculated their relative frequencies (per 10k words). 4Next, for each lemma we calculated the (LL) ratio between its frequency in CT and NT.Log-likelihood ratio is a relatively standard measure of relative frequencies in corpus linguistics (it was introduced by Dunning (1993), and we used the form of the calculation from Rayson and Garside (2000)).However, we do not treat it strictly as a measure of statistical significance as we are dealing here with rather low numbers which do not warrant such use.Instead, we treat LL ratio as a heuristic device that indicates which lemmas have the most skewed distribution.More specifically, we used the ratio to rank lemmas according to the skeweness of their distribution, i.e. from highest to lowest LL ratio (ranking only lemmas that are more frequent in CT than NT rather than the other way round-the latter were filtered out and subsequently ignored).Lemmas that exhibit a most skewed distribution can be called 'character language' words; they are those words that appear predominantly in character speech, and as such they are the precise object of our interest here.To the extent that they do infrequently appear in narrator text, according to our hypothesis, they contribute to some specific narratological effects.Therefore, they are usually not to be interpreted as belonging to the perspective of the default narrator (although, as we may see, they may express a more involved authorial comment).
Table 1 shows the top ten lemmas (in Greek characters, transliteration and translation) according to the LL ratio ranking.These are the words most characteristic of the corpus of CT, as compared to the corpus of NT. 5 The columns from left to right indicate the number of occurrences in NT and CT (out of ca.115k and 31k words, respectively), relative frequencies in NT and CT (per 10k words), direct ratio of relative frequencies, and LL ratio.As can be seen, the LL ratios for the top lemmas are extremely high.
The top five positions in this ranking are occupied by words that quite naturally occur in direct speech much more often than in an impersonal narrator's text as they are intimately related to the concreteness of a speaker and their immediate context: 'we', 'now', 'this here', 'I', and the particle γε which has a complex meaning but refers to speaker commitment.However, in this study, as explained in the introduction, we are not interested in function words of this kind, but rather in content words.For this reason we manually removed all function words (particles, pronouns, prepositions, etc.), as well as proper names from the lemma ranking.Table 2 shows top ten lemmas out of only the content words.
In the next section we will analyze the occurrences of these lemmas in NT to see if they are indeed relevant to perspectival effects.First, however, let us have a look at what kinds of words have made our top ten.The premise of our study is that Thucydides' narrator is an impersonal, distanced, and objective one, and as such reluctant to use strongly evaluative or emotive words in the narrator's voice. 6Thus, we expect such words to appear among character language lemmas, if anywhere.This is largely verified by our list.The reader should note, however, that our claim is not that the automatically generated set of words can approximate a list of 'charged' vocabulary that could be composed based on human judgments.On the contrary, our method consists in divorcing the notion of 'character language' from intuitive judgments and basing it on a quantitative basis.Therefore, we are equally interested in those words that would be independently judged as emotive or evaluative, and those that would not.
At least five of the ten words have an obvious evaluative (mostly moral) aspect to their meaning: 'just', 'do wrong', 'good', 'virtue', 'shameful'.Another two can also be counted as such.While ἴσος may simply mean 'equal', as in 'of equal size or number', it can have political and moral overtones and is also used with connotations of fairness, rightness, or equal standing.Similarly, πάσχω may just mean 'I suffer' in the sense 'it happens to me', but can also express the sense of 'suffer' that is associated with pain or misfortune and a negative evaluation of that which happens.Next, χρή, an auxiliary verb with the meaning 'it should be done that . ..' is not an evaluative word, but it does have a perspectival or subjective aspect, as it expresses the duty or need perceived by a subject.We will see that when used in narrator text each of these words has a clear tendency to be used in perspectivally rich ways, and that their non-perspectivized uses are typically those Narrator language and character language in Thucydides Digital Scholarship in the Humanities, Vol.0, No. 0, 2019 5 that express the less loaded, more objective, or descriptive meanings.
The presence of two words so high on the list of character language lemmas may seem more surprising: ἀμύνω means simply 'defend', 'help', 'come to aid'; κίνδυνος means 'danger'.Neither of the two seems to be evaluative or perspectival at first sight.Nevertheless, as we will see in the next section, the way these two words are actually used by Thucydides in narrator text is such that they are often associated with a subjective perspective.This is especially true of κίνδυνος which is never used without a perspectival flavor.

Character Language in NT
Our main interest, as we have stated, is in words that conform to the 'character language' pattern, that are much more frequently used in speeches than in NT, but nonetheless occur in NT.In this section we present our manual study of such occurrences.For practical reasons we have limited ourselves to a small subset consisting of the ten most prominent character language words.This allowed us to manually annotate each of their NT occurrences.
We annotated these words by placing them in one of several categories relating to the perspectival properties of the context in which they appear.In Section 4.1, we will briefly describe and justify the categories, after which we will illustrate them with examples, before proceeding to discuss the results of our categorization in Section 4.2.

The categories
We used the following five categories for the contexts in which character language occurs in narrator text: † Report.First of all, many of the NT occurrences of character language lemmas appear in a complement of an indirect speech or thought report.
In such cases it can be assumed that the word in question corresponds to the putative original utterance or thought, and therefore corresponds to a perspective of the reported speaker or thinker.† Reportative.In this category we include occurrences which are not syntactically within a report complement, but belong to a clause that unambiguously semantically refers to the content of an utterance, thought, or text, and therefore should also be taken as corresponding to a perspective of its speaker or thinker.† Non-perspectivized.This category comprises occurrences of character language words that in no way express a character's perspective: they are not in a clause that is reportative, and they are part of the main narrator's default, impersonal, 'zero-perspective' description of events.† Perspectivized.In this category we count such uses of character language words that appear to express the perspective of a character (or a group of characters) in the narrative, rather than the narrator's, even though they are in no way overtly reportative.Expressions contributing to what narratologists call (implicit) character 'focalization' (cf.Genette (1980); more specifically, implicit embedded focalization in de Jong's (2004) terminology) belong to this category (see example (3) in the Introduction).† Comment.Finally, among those uses that are neither reportative nor perspectivized, we also distinguished those appearing in contexts in which the narrator appears to take a point of view different than his default neutral one, and rather than describing the chain of events, reflects or comments on them.This may be a matter of Thucydides entirely shedding his narrative persona and speaking in his own voice, as the concrete individual who has lived through the events described and met many of the protagonists of his story-this is typically accompanied by use of first-person forms ('in my time', 'among those I've met'). 7However, we also count in this category instances where the narrator simply makes a general comment on a type of event or situation-we treat such comments as exhibiting a different mode than the main narrative.In this we follow Smith's (2003) work on discourse modes (who calls this mode 'report'-not to be confused with our Report category), which is later applied to classical texts by Adema ( 2007), among others.
In the annotation process, we applied the categories by leveling down in two senses.First, the Nonperspectivized category (the arguably least interesting one) was treated as the default (unless the syntax dictated choosing the Report category), and occurrences of 'character language' words were attributed to other categories, Reportative, Perspectivized, or Comment, only if we saw positive reasons to do that.Second, the deciding factor was always the interpretation of the minimal context of the word, i.e.
the clause to which it belongs-thus, even if a whole sentence or a larger passage could be treated as a comment by Thucydides, but the word actually appeared within a reportative construction, it was annotated as Reportative.All annotations were first done separately by the two of us; after that, differences were discussed until we reached an agreement.We measured inter-annotator agreement using Cohen's kappa, with the result 0.74. 8The following examples will help elucidate the meaning and application of our categories.They all concern the lemma ἀγαθός 'good', 'fit', 'virtuous', which is conveniently represented in every category.'The wide difference between the two characters, the slowness and want of energy of the Spartans as contrasted with the dash and enterprise of their opponents, proved of the greatest service, especially to a maritime empire like Athens.Indeed it was shown by the Syracusans, who were most like the Athenians in character, and also most successful in combating them.' 8.96.5 In example (4), illustrating the category Report, ἄμεινον (which is the irregular comparative of ἀγαθός) belongs to the complement of the reporting verb ἐδόκει 'they thought'.Therefore, the evaluation expressed by it is overtly attributed to the subject of the sentence.In example (5), from the category Reportative, the syntax is different, but the sense is similar: κέκληνται 'call' is not a complement-taking report verb, but it does semantically involve what someone says, and therefore the evaluation expressed by ἀγαθοί is again overtly attributed to someone other than the narrator.Example (6), by contrast, involves an evaluation made by the narrator and therefore belongs to the Non-perspectivized category.That the ships deployed by the Cortinthians on their left wing were ones crewed by the best sailors (ἄριστα is the irregular superlative of ἀγαθός) is presented as a simple fact, and not as something said or thought by a character in the narrative, nor as a truly evaluative remark from the narrator.
Example (7) illustrates one of our target categories, Perspectivized.The first sentence of this passage presents the reason why the city of Argos accepted the plan (of a new alliance between neutral states) through the use of an attitude report with the verb ἐλπίσαντες 'hoping'.The next sentence elaborates on this, by describing the situation that gave Argives hope that they could gain supremacy over Spartans.There is no report syntax or any reportative constructions in this sentence.Nevertheless, it is clear in this context that it describes the situation as it was perceived by the Argives, and insofar as it was relevant to their decision-making.The evaluation expressed by ἄριστα (here 'most successful') is made from the perspective of the Argives.This is underscored by the use of strongly evaluative vocabulary which Thucydides is otherwise (as our numerical analyses show) reluctant to use outside of speeches. 11 The final two examples illustrate two slightly different ways we applied our category Comment.In example (8), Thucydides reflects on the life and deeds of Antiphon, an eminent Athenian statesman, and states that the speech Antiphon gave on the occasion of his trial for participation in an oligarchic coup was possibly the best ever given by anyone 'up to my time'.In example (9) Thucydides remarks on the 'national' character of Spartans, Athenians, and Syracusans, and states that it was Syracusans who were the best-ἄριστα-at fighting against the Athenians.The two examples have in common the fact that the remarks are made from a perspective distinct from that from which the regular course of the narrative is presented-to make these assessments, the narrator is taking a point of view external to the course of events, from which only abstract assessments like these can be made.They are not simply assessments of particular events or objects in the story (like in example (6) from the Nonperspectivized category, where it concerns the best sailors in a concrete sea battle).Crucially, in both cases, this is marked by a temporal viewpoint that does not belong to the main timeline of events in the narrative.The temporal point of reference is identified with the time of writing, rather than with a moment in the narrative.In (9) this is also signaled by the (autonomous) use of aoristic aspect (Bary, 2009, especially Section 6.3.6).Following Smith (2003) and Adema (2007), such temporal extraneity was an important criterion in our application of the category Comment.The two examples differ in that ( 8) is more personal than (9), as also signaled by the first-person pronoun.First-person passages are rare in The Pelopennesian War, but there are a few of them, typically when Thucydides wants to comment on the life and character of an important protagonist (often at the point when the narrative brings him to the moment of their death, as in the case of Pericles in 2.65 or Nikias in 7.86).Even when overt firstperson forms do not appear in such passages, their tone always makes it clear that this is Thucydides' personal assessment of a person, event or situation.One should remember that Thucydides being a politician and general himself knew all those figures personally and has participated in or directly experienced many of the events of the war.Nevertheless, it is only rarely that he makes overt comments in his own voice like here, but when he does, it is typically with the use of evaluative vocabulary that is largely absent from the main narrative passages.Although example (9) is not personal to the same extent as (8), the temporal interpretation with respect to the writing time made us classify both kinds of passages as belonging to the category of Comment.

Categorization results
Having presented these examples, we move now to a discussion of the results of our annotation.Appendix A contains tables with all the NT occurrences of each of our subset of character language lemmas divided by the categories.Table 3 offers a summary overview.
As can be immediately seen, a definite majority of all NT occurrences of these words are in reports or reportative constructions-that is, in contexts in which they are overtly attributed to something that a character in the narrative says or thinks. 12This is what one would expect.In reports (indirect discourse), the speaker reports the content of what was said (thought, perceived).While deictic person features are always adapted to the reporting context, as for content words the speaker may follow the original utterance to a greater or lesser extent.In other words, content words give the speaker (or in Narrator language and character language in Thucydides Digital Scholarship in the Humanities, Vol.0, No. 0, 2019 9 narratives, narrator) a choice to present indirectly reported utterances from their own perspective or (pretend to) stay more faithful to the (pretended) original utterance.Without doubt this has a strong effect on the interpretation of a certain passage in terms of narrative perspective and our methodology gives the narratologist a tool to study indirect discourse reports in this respect.In the present study, however, we ignore these cases, as we are primarily interested in passages of implicit character focalization (our category Perspectivized).Four of our ten words have no Nonperspectivized occurrences, as all of their nonreportative NT uses fall into the Perspectivized or Comment category: αἰσχρός, δίκαιος, χρή, and κίνδυνος.Accordingly, they offer the most clear-cut examples of what we call character language and its role in perspectival effects in Thucydides' narrative.We will now briefly comment on each of these four lemmas, and offer some examples from both our target categories.
The first two words, αἰσχρός and δίκαιος, meaning respectively 'shameful' and 'just' are among the strongest and most highly charged moral expressions in Classical Greek.They are both quite rare in Thucydides, and even rarer in NT, which is clearly in line with, and certainly contributes to the image of Thucydidean narrator as detached and reluctant to pass moral judgment.The few times that they are used outside of speeches or reportative constructions, it is either to express a character's perspective or to make an authorial comment.Let us illustrate this with two examples, both from the Perspectivized category.
(10) κατεστήσαντο γὰρ τοὐναντίον τῆς Περσῶν βασιλείας τὸν νόμον, ὄντα μὲν καὶ τοῖς ἄλλοις Θρᾳξί, λαμβάνειν μᾶλλον ἢ διδόναι, καὶ αἴσχιον ἦν αἰτηθέντα μὴ δοῦναι ἢ αἰτήσαντα μὴ τυχεῖν 'For there was here established a custom opposite to that prevailing in the Persian kingdom, namely, of taking rather than giving; and it was more disgraceful not to give when asked than to ask and be refused.' 13 2.97.4 (11) Κορίνθιοι δὲ κατά τε τὸ δίκαιον ὑπεδέξαντο τὴν τιμωρίαν, νομίζοντες οὐχ ἧσσον ἑαυτῶν εἶναι τὴν ἀποικίαν ἢ Κερκυραίων 'The Corinthians undertook their protection as a just duty, believing the colony to belong as much to themselves as to the Corcyraeans.'1.25.3 Example (10) comes from a passage describing the customs of a Thracian tribe of Odrysae.Clearly, the moral judgment expressed by αἰσχρός here is to be interpreted as one attributed to the Odrysae, and not made by the narrator.Similarly, in (11) the Corynthians accept a plea for protection from citizens of Epidamnos, as they believe it to be their colony and it is their assessment of the plea as just and proper-not the narrator's (the issue was in fact contentious, and is presented by Thucydides as importantly contributing to the buildup of tension that led to the outbreak of the war).
The auxiliary χρή is a different case.It can mean that something should happen, is necessary or required with a variety of modal flavors, which need not involve a perspectival or subjective aspect.However, in the four non-reportative NT occurrences it always refers to someone's reasoning about what they should do.In the two Perspectivized instances, the question is that of action in battle, in the two Comment ones, of what Thucydides himself should conclude about certain historical events.There is, therefore, a strictly subjective component to Thucydides' use of χρήwhich would not be as easy to notice if the word had not been identified as 'character language' in our statistical analysis.In future research, it would be worthwhile to examine the uses of χρή with respect to their perspectival force in more detail, as well as to investigate Thucydides' uses of other modal and related auxiliary expressions, to see if they exhibit a similar, character languagelike pattern.
The fourth word, κίνδυνος 'danger', is possibly the most interesting of the ten lemmas and gives important support to our methodological approach.It has a very clear character language distribution, and its NT occurrences are divided between twenty-five reportative ones and fifteen Perspectivized or Comment, with no Non-perspectivized uses.And yet, it is a word that could have easily been ignored if one were to try identifying 'charged' expressions in isolation from their actual use in Thucydides' text, as it does not appear to semantically encode any evaluative or subjective elements (unlike e.g.δίκαιος).Nevertheless, narratological analyses have already indicated that κίνδυνος appears to play a role in perspective shifts in Thucydidean narrative: Allan (2013Allan ( , 2018) ) identifies it as one of the elements contributing to focalization in chapter 4.34 (a passage we discuss below, in section 5).He also adduces a count of the occurrences of κίνδυνος in book 1 of Thucydides (Allan, 2013, p. 381 and n. 28) indicating a character language distribution. 14We take the fact that Allan's close reading of a perspectivally rich passage led him to identify a word that is very high on our automatically generated list as validation of our method.And on the other hand, our computation confirms that κίνδυνος has a character language distribution across the whole text and that it is one of the words exhibiting such a distribution most clearly, which corroborates Allan's interpretive judgment.
In fact, uses of κίνδυνος in Thucydides' NT appear to always express a subjective aspect, and occur only in contexts where the perspective of the default chain-of-events narrator gives place to the point of view of characters, or to an authorial comment or general remark.We illustrate this with just one example here, but the use of this word in Thucydides invites further investigation (which could benefit from both our quantitative analysis and Allan's narratological interpretation).( 12 In example ( 12), κίνδυνος appears in a clause that is not reportative in any overt way, but nevertheless it conveys the perspective of the character and not that of the narrator.It offers a glimpse into Nikias' reasoning without explicitly presenting his thoughts.In other cases, one can also see that the 'danger' expressed by uses of κίνδυνος in Thucydides' NT is one that is perceived or experienced by characters, and not simply a descriptive feature of a situation.This makes it closer to emotive or evaluative words, which one would expect to constitute the bulk of character language.
The remaining six of our ten lemmas mostly conform to a similar pattern, although with some additional nuance.Each of them has a majority or large proportion of occurrences in reports and reportative constructions, and many of the non-reportative ones fall into the Perspectivized and Comment categories.Each of them also has a number of Non-Narrator language and character language in Thucydides Digital Scholarship in the Humanities, Vol.0, No. 0, 2019 11 Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqz026/5523031 by Radboud University user on 17 January 2020 perspectivized occurrences.However, these do not contradict the general tendency for charged vocabulary in Thucydides' NT to be used predominantly in perspectivized contexts or authorial comments, as we will below discuss.
The lemma with the most total occurrences in the top ten list, ἀγαθός (with the core meaning 'good'), is similar to the first two lemmas discussed above, αἰσχρός and δίκαιος: it has a strictly evaluative meaning, and may be used to convey strong moral or political judgment.However, like English 'good' it also has less loaded meanings, referring to anything that is properly functioning or of good quality.Such meanings are also, importantly, less subjective or perspectival, as they do not depend as strongly on moral and political attitudes.What makes a good knife, say, is a clearly more objective and less perspective-dependent matter than what makes a good policy or a good person.Interestingly, the range of more and less perspectivally charged meanings of ἀγαθός is reflected in the distribution of its non-reportative NT occurrences between the Perspectivized and Comment categories on the one hand, and the Non-perspectivized category on the other.Of the eleven occurrences in the latter (all, incidentally, in comparative or superlative degree), seven concern ships, sailors, or horsemen.In each case, the context is a battle episode, in which a commander deploys or sends out the best ships or best horses to a particular position.Two more concern armies improving their fortifications and sentries, and one the high quality of soil in a particular area.All of those things-ships, horses, fortifications, soil-are such that their quality is a rather objective, judgment-independent matter.None of these uses involve any sort of moral or political evaluation.Only one Non-perspectivized use of ἀγαθός expresses the sense of 'prosperity', which may be interpreted as a somewhat less objective assessment.By contrast, the uses of ἀγαθός in the Perspectivized and Comment categories involve more politically and morally loaded meanings, as illustrated respectively by the two examples below: ( In both these passages, forms of ἀγαθός refer to the excellent qualities of a famous statesman.In ( 14) the evaluation comes from Thucydides himself; while in (13) it is attributed to the perspective of the Greeks who considered Brasidas good and virtuous and therefore were more sympathetic to the Spartan cause (this passage comes from a chapter that is further discussed in the next section).In either case, the judgment is more loaded than it is with respect to the quality of horses or fortifications.
Two more words exhibit the same sort of pattern as ἀγαθός does, with Non-perspectivized uses limited to the more objective meanings, and more morally or politically charged meanings reserved for uses falling into the Perspectivized and Comment category.These are: ἴσος 'equal', which in Non-perspectivized occurrences is almost exclusively used in a simple numerical sense, 'of equal size or number', and never in the more politically charged senses of 'equal in rights' or 'fair' which the word can also express; and ἀρετή, with the central meaning 'virtue', which has only one Non-perspectivized use, where it refers to the quality of soil and not to any moral feature.
A similar tendency may be discerned in the (overall infrequent) NT occurrences of πάσχω and ἀδικέω, although somewhat less clearly, as there is no equally clear-cut distinction between a more objective and a more perspectival meaning in these two words.The uses of the former in Non-perspectivized occurrences appear to express more the sense of 'it happens to me' than 'I suffer', and similarly for the latter, which gravitates toward 'inflict damage' more than 'treat unjustly'.And indeed, one should expect that in general Thucydidean characters will more often speak of 'suffering' and 'injustice', while the narrator speaks of 'incidents' and 'damage', even if it is more a matter of the broader context and the respective themes and topics of character speeches and narrative passages.
A somewhat problematic case is presented by ἀμύνω which, besides one Comment, has twelve Non-perspectivized occurrences and nine in the Perspectivized category.However, there is no clear difference in its uses in those categories, as the two main senses of 'come to aid' or 'defend oneself' are used equally in both-and neither of them appears to be evaluative or subjective in any way.A detailed examination of the use of this word in character speeches in comparison to NT could shed some light on this issue and possibly reveal further nuances, but it falls outside of the scope of this study.
Having discussed all top ten character language lemmas, let us now reflect on the results so far from a linguistic perspective.We have seen that many of the top ten lemmas are indeed strictly evaluative words, like αἰσχρός 'shameful' and δίκαιος 'just'.In semantic theories, evaluative expressions often receive a special semantics: it is encoded in their meaning that they are evaluated with respect to someone, from the point of view of a subject (judge, experiencer etc., see for instance Lasersohn (2005)).Within the class of content words these are the prototypical perspectival elements.But in our top ten we also find lemmas like ἴσος 'equal', which arguably only indirectly and in some uses convey an evaluation ('equal, and therefore fair'), as well as κίνδυνος, 'danger', which has no such evaluative component.One question that this study raises is whether in light of its results it is still useful to make a sharp distinction between words that are semantically perspectival (in the sense that they have a judge parameter, for example) and words that are not-or whether perspective-dependence is more a matter of patterns of use rather than intrinsic meaning.Indeed, for our results a "distributional" approach, which construes meanings as patterns of use and that is popular in computational linguistics, seems well suited, at least to complement the more traditional picture of meaning in terms of reference, as used in formal semantics.In this article we use patterns of use mainly as a heuristic device (to track interesting passages and shed a new light on them), but an interesting direction to go from here is to see the patterns as more than this and assign meanings to them.Here lies a great challenge since it requires that we unite, both at the technical and at the conceptual level, the insights from both semantic approaches (Pullum, (2013), Beltagy et al., 2013).We leave this for future research and return to our narratological perspective to conclude this section.
To recapitulate, we can say that the analysis of NT occurrences of the subset of most prominent character language lemmas corroborates our hypothesis that uses of character language in Thucydides' NT are typically related to definite perspectival effects and occur in contexts in which the default impersonal narrator's point of view gives way to another perspective: in a vast majority of cases it is that of characters in the narrative (either in an overt report or in a perspectivized context), but sometimes also that of the narrator as discursive commentator.While there are exceptions (the words annotated as Non-perspectivized), the majority of them can be attributed to the words in question conveying a spectrum of meanings, some of which are importantly less subjective.There are two important ways in which our study of the use of character language in narrator text goes beyond traditional approaches such as that exemplified by de Jong's work on Homer (see Introduction).First, we identify character language through a quantitative analysis that circumvents intuitive judgments concerning which words are evaluative, emotive, or subjective in some way.Second, beyond 'perspectivized' uses of Narrator language and character language in Thucydides Digital Scholarship in the Humanities, Vol.0, No. 0, 2019 13 character language, we also identify the category Comment, which concerns a different kind of discourse shift-not from narrator to character, but rather between two different modes of the narrator. 15 That the 'commenting' voice of the narrator can be distinguished from the regular course of the narrative by the use of 'character' vocabulary is an important observation in the context of studies of narrative modes in Thucydides and other Greek authors. 16  5 Ranking chapters: clustering of character language So far we have looked at a small set of individual lemmas in their occurrences in narrator text.The next stage of our study is to investigate whether character language words cluster together in the text, and if they do, whether the passages where more such words occur together are ones that can be interpreted as highly perspectivized.To do that we looked at individual chunks of the text and calculated a 'score' for each of them, based on the presence of character language lemmas, and then ranked the chapters according to this score, to see if the ones ranking highest are intuitively also ones where a change in perspective or narrative mode occurs (i.e.where the narrator either conveys the perspective of characters or makes a general comment, possibly giving place to Thucydides speaking in his own authorial voice).We proceeded as follows.We treated as units the individual documents of the NT subcorpus, i.e. the chapters of the text as established in modern editions.One reason for this was convenience: the text is already so divided, and it is the traditional and natural division for classical scholarship.More importantly, the division into chapters is independent of our hypotheses and assumptions, but not arbitrary.The editors who created this division did not have questions of word distribution or switches of perspectival modes in mind, but they perceived certain passages of the text to have thematic unity.Chapters therefore approximate conceptual units of the narrative.Seeing whether character language lemmas cluster in such units can be more illuminating than looking at passages of arbitrary beginning and end.
The score for each chapter was calculated by counting the occurrences of each character language lemma within it, with different weights assigned to different lemmas and the score relativized to chapter length.As previously, we excluded function words and proper names from 'character language', to focus exclusively on content words.Moreover, we applied two filters to avoid noise resulting from lemmas that are either infrequent in total or are very frequent but their distribution is not strongly skewed.Regarding the former, we did not want to attach any importance to any word that occurs only a few times in Thucydides, as then the fact that, say, one of the five occurrences is in NT while the others are in CT would seem too random to be salient.With respect to the latter, the downside of using LL ratio is that it grows rapidly as the number of total occurrences grows, even if the actual ratios may be not as distinct.That is, a word that is only slightly more frequent in CT than NT, but has a very large total number of occurrences, could still exhibit a high LLratio.Accordingly, first, we excluded all lemmas that have fewer than ten total occurrences in CT and NT combined.Second, we excluded lemmas that have a direct (not LL) ratio of frequencies less than 4.This allowed us to leave out such words as ὀξέως 'quickly' or 'sharply' (with nine occurrences but eight of them in CT and one in NT) and λόγος 'reason', 'word' or 'ratio' (a ubiquitous concept, with 227 total occurrences but only 2.7 times more frequent in CT than NT), which intuitively should not be important.
The weights were set equal to the LL ratio for each lemma.This was based on the assumption that words that have a more skewed distribution are more important for potential perspectival effects.That is, a word that appears very frequently in character speech but very rarely in NT should have a greater effect than one whose relative frequencies are not as different.Finally, we also excluded extremely short chapters, as they could be awarded a high score based even on a single occurrence of a low-weight lemma.The cut-off point was 50 words, and 38 chapters (out of 727 total) were excluded. 17 Thus, we calculated the score based on 125 lemmas, with weights ranging from 5.7 (for προσδέω) to 93.25 (for δίκαιος), for 689 chapters (out of 727 total), ranging from 50 to 579 words (with mean 164.4).The full list of lemmas and their scores is presented in Appendix B. The score was a sum of the weighted value for each occurrence of each lemma from the list, divided by the number of words in the chapter.It was then normalized, so that the values shown in the following table represent the number of standard deviations from the mean.Table 4 shows the twenty top-scoring chapters, and Appendix C lists the scoring lemmas in each of them.
All of these twenty chapters stand out significantly from the text as a whole in terms of the presence of lemmas from our scoring list, but the reasons for that may be varied.In section 4.2, we saw that most occurrences of character language words in NT appear within indirect reports.When we consider parts of narrator text that have the highest concentration of character language, the same is essentially true.Most of the chapters on the list above contain a lot of indirect speech or attitude reports and the scoring lemmas, i.e. character language words, appear mostly in those reports.This is the case with chapters 1. 145, 4.40, 4.114, 4.98, 5.69, 5.49, 5.27, 8.27, 6.49, 5.63, 4.73, 4.106, 5.15, 3.68, 1.79, and 5.46-i.e.sixteen out of the twenty top scoring chapters.The appearance of character language in these passages is of great interest-it may contribute to greater expressivity or purported fidelity of the reports.However, as in the previous part of our investigation, we focus here only on character language outside of (both direct and) indirect reports.In this perspective, the most interesting passages are the remaining four chapters.After a brief note about two of them, we will cite and discuss the other two at some length.
Starting at the very top of our top twenty list, chapter 3.84 is an extreme outlier in our scoring system.It occurs at the end of the part of book 3 where Thucydides describes with dramatic pathos, and some gruesomeness, the civil strife on Korkyra that resulted from the tensions and conflicts of the Peloponnesian war.After some comments on the events in Korkyra (see also the next chapter to be discussed, 3.82), chapter 3.84 concludes these comments with a general and abstract reflection on human nature.In this short passage we find a great concentration of strongly charged language.Interestingly, the authenticity of this chapter-our extreme outlier-has been long debated by Thucydidean scholars, starting with a medieval scholiast's note that says that 'it did not seem to any of the commentators to belong to Thucydides' (Christ, (1989).Hornblower, 1997 is positive that the chapter should be excised, but see Christ (1989) for a more careful judgment.Apart from the scholiast's note, one of the main reasons why editors and commentators doubt the authenticity of this chapter is its style-more specifically, the very fact that the morally charged language here is so unlike any other passage in Thucydides (see e.g.Huart, 1968).We consider the fact that the chapter is such a clear outlier in our ranking to be a vindication of our scoring method.
The already mentioned chapter 3.82, at the bottom of the list, is a very famous passage by the end of the Korkyra episode.Here Thucydides gives a general assessment of the consequent political and moral degradation of humanity (it is clear that Korkyra is to be understood as only a single example of a widespread tendency).According to the categorization introduced earlier, the whole chapter should be understood as 'comment'-there is a clear shift away from the neutral point of view of the impersonal narrator, but not in favor of the perspective of a character in the narrative, but to that of a different, more engaged mode of narrative, possibly channeling the voice of Thucydides himself, as someone Narrator language and character language in Thucydides Digital Scholarship in the Humanities, Vol.0, No. 0, 2019 who witnessed such events and expresses his indignation and horror in his own voice.It is only to be expected that some of the morally loaded language which the Thucydidean narrator normally avoids is to be found in this passage.
'After this skirmishing had lasted some little while, the Spartans became unable to dash out with the same rapidity as before upon the points attacked, and the light troops, realizing that they now defended with less vigor, became more confident.They could see with their own eyes that they were many times more numerous than the enemy; they were now more familiar with his aspect and he seemed to them less terrible, the event not having convinced them of the worth of the apprehensions which they had suffered when they first landed in slavish dismay at the idea of attacking Spartans; and accordingly their fear changing to disdain, they now rushed upon them all together with loud shouts, and pelted them with stones, darts, and arrows, whichever came first to hand.[2] The shouting accompanying their onset confounded the Spartans, unaccustomed to this mode of fighting; dust rose from the newly burnt wood, and it was impossible to see in front of one with the arrows and stones flying through clouds of dust from the hands of numerous assailants.[3] The Spartans had now to face a difficult challenge; their caps would not keep out the arrows, and darts had broken off in the bodies of the wounded.They themselves were unable to retaliate, being prevented from using their eyes to see what was before them, and unable to hear the words of command for the hubbub raised by the enemy; danger encompassed them on every side, and they were hopeless as to how they should defend and save themselves.' 18 4.34 The words in bold are the scoring lemmas.The first four appear within the scope of indirect attitude reports (the reporting verbs are underlined), and we will therefore ignore them here.The most interesting from our point of view is the latter part of the chapter (Section 3), where the desperate situation of the defending Lacedemonians is described.Here across the span of several lines we find four significant words belonging to the character language category, including three from the top ten discussed in Section 3.Even though these are words that semantically do not necessarily involve any evaluation or subjective aspect-'difficult', 'danger', 'should', 'defend'-here they contribute to the perspectivization of the description.The difficulty of the situation, the danger, the lack of ways how they should defend themselves is clearly something that the Lacedemonians themselves perceive.In the end, a few chapters later, it will lead them to surrender.
And it has to be noted that this is, according to Thucydides, a shocking result (Spartan hoplites do not surrender, after all).In chapter 4.40 he describes how stunned all the Greeks were upon hearing the news of this ('Nothing that happened in the war surprised the Hellenes so much as this.'), and that chapter is again one of the highest scoring in our ranking, although all the scoring lemmas there appear within indirect reports, and therefore we ignore it in the present study.Thus, chapter 4.34 describes the pivotal point in a battle that is one of the most important in all of Thucydides' work, and it is very interesting to see that in this description he employs language normally reserved for character speech to induce a dramatic effect.
It is interesting to note-and provides further validation of our method of identifying perspectivally rich passages-that chapter 4.34 has attracted special attention of Thucydidean scholars from Antiquity until the most recent times. 19It has already been commented upon by Dionysius of Halicarnassus, a Greek writer of the first century BCE, in his essay on Thucydides. 20Unfortunately, Dionysius does not discuss the final part of this chapter, which is the most interesting from our point of view (he is also quite critical of the way in which Thucydides favors a striking style over clarity).In two much more recent studies, Allan (2013Allan ( , 2018) ) singles out chapter 4.34, and especially its latter part, as an example of Thucydides' use of 'characterbound perspective', which in narratological terms corresponds to 'implicit embedded focalization ' (2013, pp. 379-80).While Allan's main focus is on other kinds of linguistic categories (such as representation of time and narrative progression, spatial expressions and negations), he also mentions the contribution of character language words, such as κίνδυνος (cf.our comments on this word above, in Section 4.) The other high-scoring chapter is 4.81, in which Thucydides takes a break from the narrative to write about the character of Brasidas and his reputation among lesser Greek cities.Here is the full text of the chapter in Greek and translation: (16)  [3] πρῶτος γὰρ ἐξελθὼν καὶ δόξας εἶναι κατὰ πάντα ἀγαθὸς ἐλπίδα ἐγκατέλιπε βέβαιον ὡς καὶ οἱ ἄλλοι τοιοῦτοί εἰσιν.
'Brasidas himself was sent out by the Spartans mainly at his own desire, although the Chalcidians also were eager to have a man so energetic as he had shown himself whenever there was anything to be done at Sparta, and whose later service abroad proved of the utmost value to his country.[2] At the present moment his just and moderate conduct toward the cities generally succeeded in persuading many to revolt, besides the places which he managed to take by treachery; and thus when the Spartans desired to negotiate, as they ultimately did, they had places to offer in exchange, and the burden of war meanwhile shifted from the Peloponnessus.Later on in the war, after the events in Sicily, the present valor and conduct of Brasidas, which was known by experience to some, by hearsay to others, was what mainly created an esteem for the Spartans among the allies of Athens.
[3] He was the first who went out and showed himself so good a man at all points as to leave behind him the conviction that the rest were like him.' 4.81 The words in bold are again the scoring lemmas.
The first one may be syntactically embedded under the underlined attitude verb, but it is also plausible that δοκοῦντα and γενόμενον head two coordinate clauses, and ἄξιον belongs to the second one.
Whether the evaluative adjective meaning 'worthy', 'valuable', (or maybe 'useful'-to the Lacedemonian cause) is to be interpreted as referring to Brasidas' reputation or as Thucydides' own assessment is therefore unclear.Similarly, ἀγαθὸς in the last line is embedded under δόξας, but the meaning of the clause may either be 'he appeared to be good' or 'he proved himself to be good'-a more subjective or a more objective statement.The same is true of the words in the previous sentence referring to Brasidas' virtue and skill-while not syntactically embedded under any reporting or attitudinal verb, they are said to be heard of and experienced by people.The line between what we call a perspectivized use and a comment use of character language words in this passage are blurred-Thucydides is both writing about how Brasidas was perceived by other Greeks (perspectivized) and implying that he was indeed the worthy and virtuous man he was thought to be (comment).That this is indeed Thucydides' own assessment is indicated most clearly by the use of δίκαιον in the second sentence of the chapter.This strongly evaluative word is neither embedded modified in any way that would suggest that it refers only to an appearance of justice.As we explained in Section 4.1, what we take as Thucydides' Comment is marked either by the use of first-person forms (when, e.g. the author makes remark about 'my time ', etc.) or by breaking off from the course of the narrative to make general remarks on some phenomenon, a general kind of event, or a larger span of time.The latter is true here: Thucydides leaves aside the train of events for a brief moment, to say not only what Brasidas was achieving at this particular point in the narrative, but what he would go on to do 'later in the war'.However, Thucydides' assessment-and his use of character language words in this passage-remains anchored to the point of view of actors in the narrative (minor Greek cities in this case) through the use of perspectivizing expressions such as 'seemed', 'known by experience', etc.
In conclusion, we have seen in this section that character language lemmas in narrator text do in fact tend to cluster together, but it is most commonly in chapters that contain a lot of indirect reports and where the lemmas appear within those reports.Nonetheless, it is possible to identify some passages of the text with significant presence of non-reportative uses of character language.And it appears that this presence is not accidental, but marks passages of special character, either in terms of their dramatic quality or discursive content.We have only looked at a small number of chapters (there are certainly other interesting ones beyond our top twenty) and in a superficial way.A broader and more fine-grained investigation into the distribution and clustering of character language in Thucydides' NT would surely yield further interesting results, but our purpose in this study was only to sketch a general picture.

Conclusion
We have investigated how Thucydides uses content words to create the shifts in narrative perspective that he is well-known for.Our quantitative corpusbased approach naturally fits the subtlety of the phenomenon, which is notoriously hard to tackle precisely, especially in a past stage of a language such as Ancient Greek.The basic idea has been to single out those passages where Thucydides seems to speak in the narrator's voice, but where he uses expressions that are usually reserved for speeches, that is, contexts in which he reports characters' words.The use of such expressions in NT may indicate, or even trigger, a perspective (implicitly) shifted away from the narrator to that of a character in the story.
We proceeded in three steps: First, with the help of the lemmatizer GLEM we identified character language lemmas based on their distribution across 'character text' and 'narrator text'.We found that many but not all of these lemmas are evaluative.The fact that it is especially evaluative words that Thucydides rarely uses in the narrator's voice contributes to the impression of his narrator as impersonal and objective.We then investigated the rare passages where character language occurs outside of speeches.Indeed, in the majority of cases the lemmas are there to be interpreted as expressing a character's perspective, either in report complements or looser reportative constructions, or most interestingly, in cases of implicit character focalization.The method also yielded passages, however, where a character language expression was to be interpreted from the narrator's own perspective.In such cases we have to do with a distinct narrative mode, in which Thucydides comments on rather than just recounts the events.In the third step, we looked at passages where character language lemmas cluster as potentially interesting passages with respect to narrative perspective.Interestingly, the passage that comes out as scoring highest is chapter 84 of book 3, the authenticity of which is highly controversial, in part because the morally charged language here is so unlike any other passage in Thucydides.Other high-scoring chapters are episodes already known for their dramatic quality and intensity.We now have strong reasons to assume that this dramatic effect is at least partly induced by the employment of language normally reserved for character speech.In other words, it is (at least partly) the choice of content words, and especially evaluative words, that makes Thucydides 'produce vividly in the minds of those who peruse his narrative the emotions of amazement and consternation which were experienced by those who beheld them', to speak with Plutarch.Lower on the list we may find passages not yet studied in detail in this respect.Our methodology identifies such passages and gives the classical scholar a tool to investigate them.
As much as being a study of Thucydides, this article provides a proof of concept for a method of addressing narratological questions with the use of quite simple, but powerful quantitative corpus-based techniques.On the one hand we have seen that it may provide substantial objective support to ideas already expressed in narratological literature; on the other hand, it may draw the attention of scholars to words and passages that have previously eluded their analyses.In Thucydides's case the most striking result is the use in narrator text of nonevaluative expressions that are nevertheless strongly associated with character language and importantly contribute to narrative effects.We also hinted at how a study like ours may be used in discussions about different approaches to semantics.While this topic deserves a separate discussion, at the very least we can point out that semantic theories of narrative perspective, so far predominantly focused on grammatical elements in Free Indirect Discourse, should incorporate the important role of content words and of broader patterns of vocabulary uses in narrative texts.

Appendix A. Lemma annotation
Downloaded from https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqz026/5523031 by Radboud University user on 17 January 2020 V C The Author(s) 2019.Published by Oxford University Press on behalf of EADH.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.doi:10.1093/llc/fqz026 1

Table 2 .
Top ten character language lemmas, content words only

Table 1 .
Top ten character language lemmas according to LL ratio