Elsevier

Language Sciences

Volume 88, November 2021, 101432
Language Sciences

Towards estimating global probabilities of evaluation in English based on automatic extraction of least delicate Appraisal in large corpora

https://doi.org/10.1016/j.langsci.2021.101432Get rights and content

Highlights

  • Least delicate choices of engagement in English are globally equiprobable.

  • Neutral/evaluative system in English is globally equiprobable.

  • attitude polarity subsystem in English is globally skewed towards the positive.

  • Neutral and attitudinally positive choices globally favor monoglossic engagement.

  • Attitudinally negative choices globally favor heteroglossic engagement.

Abstract

Systemic Functional Linguistics (hereafter SFL) hypothesizes that language systems globally follow two probability distributions (equiprobable or maximally skewed). Accordingly, a particular register, corpus or text is defined probabilistically as a local resetting of the global probabilities; or in other words, a statistical deviation of the local probabilities from the two global distributions of language. However, the original context in which this hypothesis has been proposed and, later, tested is the lexico-grammar. The current study addresses this hypothesis, both computationally and statistically, with respect to the discourse semantics system of appraisal. Through the implementation of automatic classification of least delicate engagement and attitude polarity in fifteen English corpora, this paper approaches three questions concerning the global patterns of appraisal choices in English, the intrastratal conditioning between the two appraisal subsystems, and the local, register-specific deviations of local probabilities. The computationally-based statistical analysis indicates that, globally, least delicate engagement and attitude polarity are equiprobable, whereas, within the polarity subsystem itself, choices are skewed in favor of the positive. The findings of this paper also show how local deviations of appraisal probabilities in the corpora cluster the relevant registers topologically in terms of what is contextually expected and what is contextually divergent. A major implication of this paper is that computational methods to large-scale corpora can substantiate the theoretical premises of SFL.

Introduction

One of the fundamental aspects of SFL is its inherent view that the relationship between language as a system (potential) and language in use (or instance) is modelled probabilistically along the cline of instantiation (Halliday, 1985). This view is well expressed in Halliday's famous metaphor of climate and weather. As Halliday (e.g. 1991b) explains, climate and weather are in fact one phenomenon looked at from different perspectives. Climate is the long-term trends of weather events; the overall “set of probabilities in the weather” (Halliday, 1992b: 90). The weather of a particular day is an instance of climate; climate is “what instantiated in the form of weather” (Halliday and Matthiessen, 2014: 27). Between weather and climate, intermediate patterns can also be identified depending on the range of averaging we are interested in. For example, the averages of temperatures, humidity, wind speed etc. over thirty days should give us a weather ‘profile’ of a particular month. If the range is extended (to e.g. several years), the averages should provide us with better estimations of the overall climate probabilities.

Analogously, language as system and language as instance (corpus or text (are two aspects of the same phenomenon. System is the long-term patterns or global probabilities of all instances or texts. A text is an instance of language as system; an instance of systemic probabilities actualised in a specific frequency profile. Between the system and instance endpoints, intermediate probabilistic patterns can be identified along the instantiation cline where ‘register’ is probably “the most critical intermediate concept” (Halliday, 2005: 248). From a system perspective, a register can be seen as a sub-system instantiated in a specific profile of local probabilities. From an instance perspective, a register “appears as a cluster of similar texts, a text type” (Halliday, 2005: 248). Probabilistically, then, a register can be defined as “a tendency to select certain combinations of meanings with certain frequencies, and this can be formulated as the [local] probabilities attached to ... systems” (Halliday, 1991a: 66). In the same way a particular weather can re-adjust the long-term averages of its climate, every text perturbs the probabilities of systemic choices associated with its register and eventually “perturbs the overall [global] probabilities of the system, to an infinitesimal extent” (Halliday, 1992a: 76).

From a quantitative perspective, the SFL probabilistic view of language is pivotal for three analytical and practical reasons. First, it re-construes a register, a corpus or even a single text as a “resetting of the probabilities” in language systems (Halliday, 1991a: 38). That is, a register is different from other registers due to its unique deviations of local probabilities from global, expected ones. Along the same lines, a text of a particular register is linguistically distinguishable from similar texts by its unique deviations from the local probabilities associated with its register. Second, and most importantly, it provides a way to calculate, or at least estimate, global probabilities of a specific system from a representative sample of a language. This sample would likely be a corpus of a wide variety of registers, as Halliday (1998) notes “that the corpus is big enough, we can get at them [global probabilities], because the corpuses now range across lots of different registers, spoken and written discourse” (p.170). Third, SFL probabilistic modelling of language enables us to look at registers, and genres for that matter, topologically rather than typologically. The topological perspective, as Christie and Martin (2000: 16) explains, “allows us to position texts on a cline, as more or less prototypical” of their kind; not only from a stratification standpoint but also from an instantiation one based on their probability profiles (as will be further emphasized later). In addition to complementing the predominantly typological descriptions of language, topological representations are more suitable to the fuzzy nature of semantic systems (Halliday, 2003; Lemke, 1999).

Estimating global probabilities is essential to our understanding of language, not only because they are important per se, but, more importantly, because “we need them as the kind of baseline against which we match probabilities in particular sets of texts, different registers” (Halliday, 1998: 170). But what is the nature of this baseline? Halliday (e.g. 1991b) hypothesizes that for binary lexicogrammatical systems, global probabilities are either equiprobable (0.5/0.5) or maximally skew (0.9/0.1 or 0.1/0.9). Examples of equiprobable systems in English lexicogrammar include: number (singular/plural), nominal deixis (specific/non-specific) and verbal deixis (modality/tense), whereas skew systems include: polarity (positive/negative), mood (indicative/imperative), and voice (active/passive) (Halliday, 1991b: 45). In the latter systems, the global probability distribution is skewed towards the unmarked choice (e.g. positive, indicative, and active, respectively). From information theory perspective, Halliday (1991a) further notes that equiprobable systems such as number and deixis are maximally informative where a choice is probabilistically more ‘surprising’ or has a higher ‘entropy’ (to use Shannon and Weaver's terminology) (Shannon and Weaver, 1964). Skew systems such as polarity and voice, on the other hand, are maximally uninformative or ‘redundant’ where one choice is highly predictable. For instance, globally, positive and active are far more likely to be chosen than negative and passive (Halliday, 1991a: 38).

In order to quantify the two distributional profiles of global probabilities, Halliday (e.g. 1991a: 35) uses the following entropy formula (proposed by Shannon, 1948; Shannon and Weaver, 1964):H=i=1nP(xi)log2P(xi)

In an SFL context, n is the total number of choices in the system, P(xi) is the probability of the ith choice, and H is the information entropy of the whole system. Given this formula, the information content conveyed by equiprobable and skew systems can be obtained as follows, respectivelyH=i=120.5×log20.5=1H=[(0.9×log20.9)+(0.1×log20.1)]0.46

As skewness increases, information content decreases up to a value close to zero in systems where the unmarked choice is very highly probable (e.g. 0.99:0.01). It should be noted here that although Halliday (e.g. 1991a) uses Eq. (1) to quantify the properties of global probabilities, it can be further generalized to quantify deviations of particular local probabilities from the global profiles through relative entropy and other statistical measures as will be seen in the following sections.

Furthermore, even though the two distributional patterns of global probabilities are mainly hypothesized in a context of binary systems (such as polarity, voice, and taxis), they are observable in n-term systems where n is > 2, as Halliday and James (1993) notes, “a similar pattern would be predicted for ternary systems, excepts that it should be possible to find more than one type with the skew” (p.7). That is, in a three-term system, global probabilities may be skewed towards two choices simultaneously in such a way that both are equiprobable but the third choice in the system has far less probability as the marked one. An example of this case is the English system of tense. In their analysis of tense in the COBUILD corpus, Halliday and James (1993) observe that the system is highly skewed towards the past and present which are both equiprobable (≈0.47:0.47 respectively), whereas the future is very marked (with a probability ≈ 0.06), as illustrated by the histogram in Fig. 1(a) below. An alternative way to represent this pattern is through the system network in Fig. 1(b) where tense is binarized, so to speak, into two equiprobable choices (past and non-past), and the non-past is a condition entry to a further skew system of two choices (present and future). This latter ‘reductionist’ representation is probably more preserving of the two global patterns and, thus, will be adopted throughout this paper where appropriate.

Nevertheless, two questions still need further investigation. The first is concerned with global probabilities in relation to what Halliday (e.g. 1991b) refers to as conditional probabilities. These are of two types: interstratal and intrastratal. The former is a special type of local probabilities and is register-specific as the conditioning is generally determined by the context of situation. An example, par excellence, of this conditioning is how social class conditions choices of verbal deixis in the interview register (Plum and Cowling, 1987). The second type is internal as the conditioning occurs within language systems. These probabilities arise “when systems are intersected…one being treated as the conditioning environment of the other” or, in other words, when certain choices in one system probabilistically favor or attract other choices in another system (Halliday, 1991b: 54). An example of this conditioning is the influence of taxis choices on speakers' choices of logico-semantic relations and vice versa (Halliday, 1991b: 49). At the sub-potential/register level, this type of conditional probabilities draws relatively more SFL attention and often discussed under the notions of ‘systemic intersections’ (Matthiessen, 2002, 2006; Nesbitt and Plum, 1988; Rodríguez-Vergara, 2015) or ‘coupling’ (Martin, 2000; Zappavigna et al., 2009).

In these studies, the focus has been mainly on how systems condition each other ‘locally’ in small-scale samples of certain registers (e.g. interviews, spoken narratives, news reports, scientific expositions, argumentative essays). It has been, and still is, challenging to generalize conditional probabilities beyond the local level of a relatively small corpus to the global level of a large representative corpus mainly because “the analysis still has to be manual…no [SFL] parser can yet assign enough feature descriptions” (Halliday, 1991b: 56). Therefore, a practical, albeit not necessarily accurate, alternative is to assume that conditional probabilities at the global level are simultaneously independent and, accordingly, calculated as the product of independent probabilities of the relevant systemic choices (Zappavigna et al., 2008: 172). But this might not be the case and globally conditional probabilities might in fact “not simply be the product of the separate probabilities of each” (Halliday, 1991b: 54). It is very possible that the phylogenetic patterns of the overall systemic potential have been evolved to be skewed towards certain favoring or disfavoring intersections (Matthiessen, 2006). Nowadays, sufficiently representative corpora are readily available and, when coupled with advances in computational linguistics and Natural Language Processing (NLP) techniques, a better alternative would be to estimate these probabilities. This can be achieved through the implementation of automatic classifiers targeting certain choices and systemic intersections. Despite the fact that such classifiers would be somewhat limited in their scope of language analysis and levels of delicacy (as will be further discussed later), they should give us, with an acceptable accuracy, a global baseline of some systems in addition to new insights on how mutual conditioning takes place at the system potential.

The second question is concerned with patterns of systemic probabilities (both global and local) from a stratification perspective: Will the two bimodal distributions of global probabilities (and local or conditional deviations thereof) be observable in higher-strata systems including mainly what Martin (e.g. 1992) refers to as discourse semantics systems? The original context in which Halliday hypothesizes (and tests) the two distributional patterns is lexicogrammar. It is indeed more challenging to perform automatic analysis of discourse semantics systems as Matthiessen (2006) notes “automation of the analysis becomes increasingly difficult as we ascend the stratal organization towards semantics…and manual analysis has to take over” (p.109). In a semantic analysis of large-scale corpora, it is possible to rely on what Hunston (2006) refers to as the ‘phraseological approach’ in which certain phrases are extracted as features of a particular discourse semantics system, and generalized as global or sub-global patterns of that system. A notable example of this approach is Miller and Johnson (2014), where certain phrasal structures (such as it is ∗ time for; I think it is time for ∗) that explicitly encode (or realize) appraisal are extracted automatically (or concordanced) and discussed in relation to some global patterns of evaluation in several corpora. However, this approach can be extremely vulnerable to a high rate of misses (i.e. false negatives) since a small number of phrases cannot be sufficiently representative of the set of all possible realizations of a discourse semantics system. In fact, it can be argued that the false negative rate that may result from this approach is far greater than that of a moderately-accurate machine classifier, as will be pointed out in the following sections.

In response, this paper attempts to approach the two questions quantitatively from both corpus-based and computational perspectives. The overarching aim is to investigate Halliday's bimodal distributions in relation to global, conditional and local probabilities of a discourse semantics system in sufficiently large and representative English corpora. The discourse semantics system opted for in this paper is appraisal (Martin, 2000; Martin and Plum, 1997; Martin and Rose, 2003; Martin and White, 2005) at its least delicate levels (as reviewed in Section 2). The choice of appraisal is motivated by two considerations. First, appraisal is arguably the most actively-researched discourse semantics system in ‘Martinian’ SFL framework, making it more convenient to compare, where appropriate, the results (outlined in Section 3) with a relatively abundant body of previous findings available in other studies (as done in Section 4). Second, some aspects of appraisal are also an active area of machine classification research mainly under the notions of sentiment analysis, opinion and subjectivity classification (as well be briefly discussed in Section 2). This should provide us with an adequate number of benchmarks and baselines against which the least delicate appraisal classifiers implemented in this paper will be evaluated.

More specifically, the aim of this paper can be split into the following three questions:

  • 1)

    What are the global probabilities of engagement and attitude polarity choices in English as estimated from the analysis of representative (or reference) corpora? And do they conform to Halliday's two global distributions?

  • 2)

    What are the intrastratal, internal conditional patterns of co-choices (or intersections) of engagement and attitude polarity at the system end of instantiation? In other words, how do the global probabilities of engagement choices (in question 1) globally condition attitude polarity choices and vice versa?

  • 3)

    How do corpora of certain English registers cluster in terms of their statistical deviations from the reference global probabilities (in question 1) and the reference conditional probabilities (in question 2)? In other words, how can they be represented topologically in terms of least delicate choices and co-choices of appraisal?

Section snippets

English corpora

To achieve the main aim of this study which requires some generalization both at the level of register and towards the global system (or meaning potential) of appraisal, the automatic classifiers discussed in Section 2.3 below will be applied to fifteen English corpora with a total size of approximately 3.6 billion words in around 152 million sentences. The corpora are classified into two categories: reference and register-specific. The reference category includes corpora of a wide-variety of

Patterns of least delicate appraisal in reference corpora

As far as the first question of this paper is concerned, Fig. 4 below shows the overall average frequency profiles of engagement and attitude polarity in the three reference corpora. For engagement, monoglossic choices are slightly more frequent (= 0.53) than heteroglossic choices (= 0.47). A similar pattern can be observed between ‘evaluative’ (=0.53) and ‘neutral’ (=0.47). For attitude polarity, positive is notably more frequent (=0.63) than ‘negative’ (=0.37). In order to fit these observed

Discussion and conclusion

This paper approaches Halliday's bimodal hypothesis of systemic probabilities by addressing three questions in the context of automatic analysis of least delicate appraisal in fifteen English corpora. The first question is concerned with the global patterns of appraisal probabilities estimated from three English reference corpora, namely Brown, BNC and COCA-mini. The statistical analysis presented in the previous section suggests that, globally at the system pole, least delicate engagement

References (72)

  • R.T. Miller et al.

    Valued voices: students' use of Engagement in argumentative history writing

    Ling. Educ.

    (2014)
  • H. Abdi et al.

    Correspondence analysis

  • B.A.A. Almutairi

    Significant patterns of APPRAISAL in online debates

    Int. J. Ling.

    (2019)
  • B.A.A. Almutairi

    Quantifying systemic coupling and syndrome using multivariate statistical methods: an SFL corpus example

    Ling. Hum. Sci.

    (2021)
  • S. Argamon et al.

    Automatically determining attitude type and force for sentiment analysis

  • A. Arya et al.

    A review: sentiment analysis and opinion mining

  • M.J. Barrios-Sabador

    The expression of uncertainty as a strategy for mitigating the assertion. An analysis in an oral computerized corpus

  • J. Bateman et al.

    Systemic functional linguistics and computation: new directions, new challenges

  • D. Biber

    An analytical framework for register studies

  • C. Chelba et al.

    One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

  • F. Christie et al.

    Genre and Institutions: Social Processes in the Workplace and School

    (2000)
  • N.C. Dang et al.

    Sentiment analysis based on deep learning: a comparative study

    Electronics

    (2020)
  • D. Garcia et al.

    Positive words carry less information than negative words

    EPJ Data Sci.

    (2012)
  • D. Glynn

    Correspondence analysis: exploring data and identifying patterns

  • Y. Goldberg

    Neural Network Methods for Natural Language Processing

    (2017)
  • M.A. Halliday

    An Introduction to Functional Grammar

    (1985)
  • M.A. Halliday

    Corpus studies and probabilistic grammar

  • M.A. Halliday

    Towards probabilistic interpretations

  • M.A. Halliday

    language as system and language as instance: the corpus as a theoretical construct

  • M.A. Halliday

    New ways of analysing meaning: the challenge to applied linguistics

  • M.A. Halliday

    Computing meaning: some reflections on past experience and present prospects

  • M.A. Halliday

    With Geoff Thompson and Heloisa collins

  • M.A. Halliday

    Introduction: on the "architecture" of human language

  • M.A. Halliday
    (2005)
  • M.A. Halliday et al.

    A quantitative study of polarity and primary tense in the English finite clause

  • M.A. Halliday et al.

    Halliday's Introduction to Functional Grammar

    (2014)
  • S. Hunston

    Phraseology and system: a contribution to the debate

  • C.J. Hutto et al.

    VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text Paper Presented at the the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM)

    (2014)
  • L. Immes et al.

    Detection of Common English Grammar Usage Errors

    (2019)
  • V.P. Jariwala

    Optimal feature extraction based machine learning approach for sarcasm type detection in news headlines

    Int. J. Comput. Appl.

    (2020)
  • T. Joachims

    Learning to Classify Text Using Support Vector Machines

    (2002)
  • T.R. Johnson et al.

    Introduction: the role of oral Arguments in the Supreme Court

  • V. Kolhatkar et al.

    The SFU opinion and comments corpus: a corpus for the analysis of online news comments

    Corpus Pragmat.

    (2020)
  • H. Kruger et al.

    Register change in the British and Australian Hansard (1901-2015)

    J. Engl. Ling.

    (2019)
  • S. Kullback et al.

    On information and sufficiency

    Ann. Math. Stat.

    (1951)
  • V. Kumar et al.

    A TfidfVectorizer and SVM based sentiment analysis framework for text data corpus

  • Cited by (0)

    View full text