Towards estimating global probabilities of evaluation in English based on automatic extraction of least delicate Appraisal in large corpora
Introduction
One of the fundamental aspects of SFL is its inherent view that the relationship between language as a system (potential) and language in use (or instance) is modelled probabilistically along the cline of instantiation (Halliday, 1985). This view is well expressed in Halliday's famous metaphor of climate and weather. As Halliday (e.g. 1991b) explains, climate and weather are in fact one phenomenon looked at from different perspectives. Climate is the long-term trends of weather events; the overall “set of probabilities in the weather” (Halliday, 1992b: 90). The weather of a particular day is an instance of climate; climate is “what instantiated in the form of weather” (Halliday and Matthiessen, 2014: 27). Between weather and climate, intermediate patterns can also be identified depending on the range of averaging we are interested in. For example, the averages of temperatures, humidity, wind speed etc. over thirty days should give us a weather ‘profile’ of a particular month. If the range is extended (to e.g. several years), the averages should provide us with better estimations of the overall climate probabilities.
Analogously, language as system and language as instance (corpus or text (are two aspects of the same phenomenon. System is the long-term patterns or global probabilities of all instances or texts. A text is an instance of language as system; an instance of systemic probabilities actualised in a specific frequency profile. Between the system and instance endpoints, intermediate probabilistic patterns can be identified along the instantiation cline where ‘register’ is probably “the most critical intermediate concept” (Halliday, 2005: 248). From a system perspective, a register can be seen as a sub-system instantiated in a specific profile of local probabilities. From an instance perspective, a register “appears as a cluster of similar texts, a text type” (Halliday, 2005: 248). Probabilistically, then, a register can be defined as “a tendency to select certain combinations of meanings with certain frequencies, and this can be formulated as the [local] probabilities attached to ... systems” (Halliday, 1991a: 66). In the same way a particular weather can re-adjust the long-term averages of its climate, every text perturbs the probabilities of systemic choices associated with its register and eventually “perturbs the overall [global] probabilities of the system, to an infinitesimal extent” (Halliday, 1992a: 76).
From a quantitative perspective, the SFL probabilistic view of language is pivotal for three analytical and practical reasons. First, it re-construes a register, a corpus or even a single text as a “resetting of the probabilities” in language systems (Halliday, 1991a: 38). That is, a register is different from other registers due to its unique deviations of local probabilities from global, expected ones. Along the same lines, a text of a particular register is linguistically distinguishable from similar texts by its unique deviations from the local probabilities associated with its register. Second, and most importantly, it provides a way to calculate, or at least estimate, global probabilities of a specific system from a representative sample of a language. This sample would likely be a corpus of a wide variety of registers, as Halliday (1998) notes “that the corpus is big enough, we can get at them [global probabilities], because the corpuses now range across lots of different registers, spoken and written discourse” (p.170). Third, SFL probabilistic modelling of language enables us to look at registers, and genres for that matter, topologically rather than typologically. The topological perspective, as Christie and Martin (2000: 16) explains, “allows us to position texts on a cline, as more or less prototypical” of their kind; not only from a stratification standpoint but also from an instantiation one based on their probability profiles (as will be further emphasized later). In addition to complementing the predominantly typological descriptions of language, topological representations are more suitable to the fuzzy nature of semantic systems (Halliday, 2003; Lemke, 1999).
Estimating global probabilities is essential to our understanding of language, not only because they are important per se, but, more importantly, because “we need them as the kind of baseline against which we match probabilities in particular sets of texts, different registers” (Halliday, 1998: 170). But what is the nature of this baseline? Halliday (e.g. 1991b) hypothesizes that for binary lexicogrammatical systems, global probabilities are either equiprobable (0.5/0.5) or maximally skew (0.9/0.1 or 0.1/0.9). Examples of equiprobable systems in English lexicogrammar include: number (singular/plural), nominal deixis (specific/non-specific) and verbal deixis (modality/tense), whereas skew systems include: polarity (positive/negative), mood (indicative/imperative), and voice (active/passive) (Halliday, 1991b: 45). In the latter systems, the global probability distribution is skewed towards the unmarked choice (e.g. positive, indicative, and active, respectively). From information theory perspective, Halliday (1991a) further notes that equiprobable systems such as number and deixis are maximally informative where a choice is probabilistically more ‘surprising’ or has a higher ‘entropy’ (to use Shannon and Weaver's terminology) (Shannon and Weaver, 1964). Skew systems such as polarity and voice, on the other hand, are maximally uninformative or ‘redundant’ where one choice is highly predictable. For instance, globally, positive and active are far more likely to be chosen than negative and passive (Halliday, 1991a: 38).
In order to quantify the two distributional profiles of global probabilities, Halliday (e.g. 1991a: 35) uses the following entropy formula (proposed by Shannon, 1948; Shannon and Weaver, 1964):
In an SFL context, n is the total number of choices in the system, P(xi) is the probability of the ith choice, and H is the information entropy of the whole system. Given this formula, the information content conveyed by equiprobable and skew systems can be obtained as follows, respectively
As skewness increases, information content decreases up to a value close to zero in systems where the unmarked choice is very highly probable (e.g. 0.99:0.01). It should be noted here that although Halliday (e.g. 1991a) uses Eq. (1) to quantify the properties of global probabilities, it can be further generalized to quantify deviations of particular local probabilities from the global profiles through relative entropy and other statistical measures as will be seen in the following sections.
Furthermore, even though the two distributional patterns of global probabilities are mainly hypothesized in a context of binary systems (such as polarity, voice, and taxis), they are observable in n-term systems where n is > 2, as Halliday and James (1993) notes, “a similar pattern would be predicted for ternary systems, excepts that it should be possible to find more than one type with the skew” (p.7). That is, in a three-term system, global probabilities may be skewed towards two choices simultaneously in such a way that both are equiprobable but the third choice in the system has far less probability as the marked one. An example of this case is the English system of tense. In their analysis of tense in the COBUILD corpus, Halliday and James (1993) observe that the system is highly skewed towards the past and present which are both equiprobable (≈0.47:0.47 respectively), whereas the future is very marked (with a probability ≈ 0.06), as illustrated by the histogram in Fig. 1(a) below. An alternative way to represent this pattern is through the system network in Fig. 1(b) where tense is binarized, so to speak, into two equiprobable choices (past and non-past), and the non-past is a condition entry to a further skew system of two choices (present and future). This latter ‘reductionist’ representation is probably more preserving of the two global patterns and, thus, will be adopted throughout this paper where appropriate.
Nevertheless, two questions still need further investigation. The first is concerned with global probabilities in relation to what Halliday (e.g. 1991b) refers to as conditional probabilities. These are of two types: interstratal and intrastratal. The former is a special type of local probabilities and is register-specific as the conditioning is generally determined by the context of situation. An example, par excellence, of this conditioning is how social class conditions choices of verbal deixis in the interview register (Plum and Cowling, 1987). The second type is internal as the conditioning occurs within language systems. These probabilities arise “when systems are intersected…one being treated as the conditioning environment of the other” or, in other words, when certain choices in one system probabilistically favor or attract other choices in another system (Halliday, 1991b: 54). An example of this conditioning is the influence of taxis choices on speakers' choices of logico-semantic relations and vice versa (Halliday, 1991b: 49). At the sub-potential/register level, this type of conditional probabilities draws relatively more SFL attention and often discussed under the notions of ‘systemic intersections’ (Matthiessen, 2002, 2006; Nesbitt and Plum, 1988; Rodríguez-Vergara, 2015) or ‘coupling’ (Martin, 2000; Zappavigna et al., 2009).
In these studies, the focus has been mainly on how systems condition each other ‘locally’ in small-scale samples of certain registers (e.g. interviews, spoken narratives, news reports, scientific expositions, argumentative essays). It has been, and still is, challenging to generalize conditional probabilities beyond the local level of a relatively small corpus to the global level of a large representative corpus mainly because “the analysis still has to be manual…no [SFL] parser can yet assign enough feature descriptions” (Halliday, 1991b: 56). Therefore, a practical, albeit not necessarily accurate, alternative is to assume that conditional probabilities at the global level are simultaneously independent and, accordingly, calculated as the product of independent probabilities of the relevant systemic choices (Zappavigna et al., 2008: 172). But this might not be the case and globally conditional probabilities might in fact “not simply be the product of the separate probabilities of each” (Halliday, 1991b: 54). It is very possible that the phylogenetic patterns of the overall systemic potential have been evolved to be skewed towards certain favoring or disfavoring intersections (Matthiessen, 2006). Nowadays, sufficiently representative corpora are readily available and, when coupled with advances in computational linguistics and Natural Language Processing (NLP) techniques, a better alternative would be to estimate these probabilities. This can be achieved through the implementation of automatic classifiers targeting certain choices and systemic intersections. Despite the fact that such classifiers would be somewhat limited in their scope of language analysis and levels of delicacy (as will be further discussed later), they should give us, with an acceptable accuracy, a global baseline of some systems in addition to new insights on how mutual conditioning takes place at the system potential.
The second question is concerned with patterns of systemic probabilities (both global and local) from a stratification perspective: Will the two bimodal distributions of global probabilities (and local or conditional deviations thereof) be observable in higher-strata systems including mainly what Martin (e.g. 1992) refers to as discourse semantics systems? The original context in which Halliday hypothesizes (and tests) the two distributional patterns is lexicogrammar. It is indeed more challenging to perform automatic analysis of discourse semantics systems as Matthiessen (2006) notes “automation of the analysis becomes increasingly difficult as we ascend the stratal organization towards semantics…and manual analysis has to take over” (p.109). In a semantic analysis of large-scale corpora, it is possible to rely on what Hunston (2006) refers to as the ‘phraseological approach’ in which certain phrases are extracted as features of a particular discourse semantics system, and generalized as global or sub-global patterns of that system. A notable example of this approach is Miller and Johnson (2014), where certain phrasal structures (such as it is ∗ time for; I think it is time for ∗) that explicitly encode (or realize) appraisal are extracted automatically (or concordanced) and discussed in relation to some global patterns of evaluation in several corpora. However, this approach can be extremely vulnerable to a high rate of misses (i.e. false negatives) since a small number of phrases cannot be sufficiently representative of the set of all possible realizations of a discourse semantics system. In fact, it can be argued that the false negative rate that may result from this approach is far greater than that of a moderately-accurate machine classifier, as will be pointed out in the following sections.
In response, this paper attempts to approach the two questions quantitatively from both corpus-based and computational perspectives. The overarching aim is to investigate Halliday's bimodal distributions in relation to global, conditional and local probabilities of a discourse semantics system in sufficiently large and representative English corpora. The discourse semantics system opted for in this paper is appraisal (Martin, 2000; Martin and Plum, 1997; Martin and Rose, 2003; Martin and White, 2005) at its least delicate levels (as reviewed in Section 2). The choice of appraisal is motivated by two considerations. First, appraisal is arguably the most actively-researched discourse semantics system in ‘Martinian’ SFL framework, making it more convenient to compare, where appropriate, the results (outlined in Section 3) with a relatively abundant body of previous findings available in other studies (as done in Section 4). Second, some aspects of appraisal are also an active area of machine classification research mainly under the notions of sentiment analysis, opinion and subjectivity classification (as well be briefly discussed in Section 2). This should provide us with an adequate number of benchmarks and baselines against which the least delicate appraisal classifiers implemented in this paper will be evaluated.
More specifically, the aim of this paper can be split into the following three questions:
- 1)
What are the global probabilities of engagement and attitude polarity choices in English as estimated from the analysis of representative (or reference) corpora? And do they conform to Halliday's two global distributions?
- 2)
What are the intrastratal, internal conditional patterns of co-choices (or intersections) of engagement and attitude polarity at the system end of instantiation? In other words, how do the global probabilities of engagement choices (in question 1) globally condition attitude polarity choices and vice versa?
- 3)
How do corpora of certain English registers cluster in terms of their statistical deviations from the reference global probabilities (in question 1) and the reference conditional probabilities (in question 2)? In other words, how can they be represented topologically in terms of least delicate choices and co-choices of appraisal?
Section snippets
English corpora
To achieve the main aim of this study which requires some generalization both at the level of register and towards the global system (or meaning potential) of appraisal, the automatic classifiers discussed in Section 2.3 below will be applied to fifteen English corpora with a total size of approximately 3.6 billion words in around 152 million sentences. The corpora are classified into two categories: reference and register-specific. The reference category includes corpora of a wide-variety of
Patterns of least delicate appraisal in reference corpora
As far as the first question of this paper is concerned, Fig. 4 below shows the overall average frequency profiles of engagement and attitude polarity in the three reference corpora. For engagement, monoglossic choices are slightly more frequent (= 0.53) than heteroglossic choices (= 0.47). A similar pattern can be observed between ‘evaluative’ (=0.53) and ‘neutral’ (=0.47). For attitude polarity, positive is notably more frequent (=0.63) than ‘negative’ (=0.37). In order to fit these observed
Discussion and conclusion
This paper approaches Halliday's bimodal hypothesis of systemic probabilities by addressing three questions in the context of automatic analysis of least delicate appraisal in fifteen English corpora. The first question is concerned with the global patterns of appraisal probabilities estimated from three English reference corpora, namely Brown, BNC and COCA-mini. The statistical analysis presented in the previous section suggests that, globally at the system pole, least delicate engagement
References (72)
- et al.
Valued voices: students' use of Engagement in argumentative history writing
Ling. Educ.
(2014) - et al.
Correspondence analysis
Significant patterns of APPRAISAL in online debates
Int. J. Ling.
(2019)Quantifying systemic coupling and syndrome using multivariate statistical methods: an SFL corpus example
Ling. Hum. Sci.
(2021)- et al.
Automatically determining attitude type and force for sentiment analysis
- et al.
A review: sentiment analysis and opinion mining
The expression of uncertainty as a strategy for mitigating the assertion. An analysis in an oral computerized corpus
- et al.
Systemic functional linguistics and computation: new directions, new challenges
An analytical framework for register studies
- et al.
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling