Expressive Sibilant Retraction in North Norwegian: morpheme or ‘spoken gesture’?

North Norwegian has a contrast between /s/ and /ʂ/ that is neutralized in word-initial position before a consonant, and an optional process of Expressive Sibilant Retraction (ESR), which changes /s/ to [ʂ] in precisely the environment where the contrast is neutralized (Broch 1927). ESR appears ambiguous between a word formation process and a spoken gesture (Okrent 2002; Perlman et al. 2015). On the one hand, ESR exploits givens of phonological structure. On the other, treating it as a morphological process entails claiming that the spell-out of certain (“ expressive”) morphemes may take place after phonological processes have applied, or that the realization of these morphemes takes precedence to phonological constraints. I argue that ESR is a communicative (i.e. non-linguistic, or post-linguistic) spoken gesture that nonetheless exploits the suspension of phonological generalizations in a way that directs attention to its iconic function. I describe the varied interpretations that ESR has depending on whether it indexes an action/event, object, or state/property, and propose that these share a common semantic core. This gesture-based account of ESR is offered as a possible model for “expressive phonology” (e.g. Diffloth 1979) in other languages.


Introduction
Expressive Sibilant Retraction (ESR) in North Norwegian changes an /s/ to the corresponding postalveolar fricative [ʂ] in certain pragmatic contexts. For example, the verb skubbe, 'to shove', may alternate as shown in (1). In Supplementary file 1, I provide for each North Norwegian example the nearest equivalent in bokmål, the most widely used standard written variety of Norwegian.
A similar expressive process has been described for the Urban East Norwegian (UEN) variety in the earlier literature by Larsen (1907) and Broch (1927). Exactly as these researchers describe for UEN, ESR in North Norwegian may only apply where the phonological contrast between /s/ and /ʂ/ is otherwise neutralized. In word-initial position before a vowel, where the two sounds contrast, ESR cannot apply. Thus, marking infelicitous applications of ESR with an exclamation mark <!>, /sɑːɡɑ/, 'sawed' cannot be realized ! [ʂɑːɡɑ] (with the intended meaning 'sawed forcefully'). An adequate account of ESR must first explain the connection between these phonological restrictions on its application (form) and its expressive function. Second, it must provide an account of what ESR expresses, and how. For example, it is clear that ESR in North Norwegian is not an allpurpose intensive. Thus, (2b) is not an acceptable intensive version of (2a).
(2) a. hu stɑːs-ɑ sa up she.sbj doll-pst refl up 'She dolled herself up.' b. ! hu ʂtɑːs-ɑ sa up (Intended meaning: 'She really dolled herself up!') Neither Larsen nor Broch provide an account of the interpretations of ESR for UEN beyond simply listing certain emotional nuances, which in the case of (Broch 1927: 154) include "contempt, anger, a feeling of power, boldness or admiration, a hint of intimacy, degrees of emphasis". 2 The present study shows that the interpretations of ESR are in fact constrained, and vary depending on whether they attach to actions/events, objects or states/properties. The pattern of variation allows us to crystallize a core meaning and an account of the formfunction relation. An immediate question that ESR raises is whether it is part of the grammar or not. On the face of it, ESR resembles a morphological process that introduces, or perhaps reintroduces, a contrast between /s/ and /ʂ/ in word-initial position before a consonant. This account would make ESR the source of a marginal contrast, since there is no lexical contrast between these sibilants in this environment. However, I shall argue that ESR does not belong to the grammar (morphology or phonology), but is instead part of the communication system, specifically, a multimodal post-phonological component that integrates speech with conversational gestures, including manual, facial and spoken gestures. The current account of ESR may thus cast light on expressive phenomena in general, suggesting a way of dealing with at least one type of apparent marginal contrast.
Although found in UEN and other varieties throughout Norway, this article will draw on examples from North Norwegian. I offer a few brief comments here to place North Norwegian in its wider sociolinguistic setting. UEN is the dominant spoken variety of Norwegian, the educated varieties of which are described by Kristoffersen (2000). Although centered on Oslo, UEN is widely spoken in population centers throughout Norway and many towns have significant clusters of UEN speakers. The social dominance of UEN is such that children of UEN-speaking parents throughout Norway frequently target, acquire and grow up using UEN regardless of the ambient variety, despite the comparatively high traditional regard that regional dialects in Norway enjoy. Most regional urban centers have developed varieties with pronounced regional features, but which are increasingly leveled towards UEN in terms of lexis and idiom, morphology and segmental phonology.
The organization of the rest of this paper is as follows. Section 2 lays out the phonological background, the lexical and phonological sources of postalveolar consonants in North Norwegian, and an analysis framed in Optimality Theory (OT). Section 3 describes the problems of a morphological analysis of ESR and lays the foundations for an account of ESR as a spoken gesture. Section 4 provides an account of the different interpretations of ESR depending on whether it indexes an action or event, object, or state/property. Section 5 attempts to trace the relationships between these interpretations and identify a semantic core for ESR. Finally, section 6 concludes.
The table in (3) shows a representative consonant inventory for North Norwegian. ( Although phonetic realization and lexical distribution vary, the North Norwegian vowel inventory in (4)  While there is agreement that /s/ is lamino-alveolar, Kristoffersen (2000: 23) writes regarding UEN that "the precise articulatory properties of [ʂ] […] are somewhat unclear". Historically, the postalveolar fricative /ʂ/ derives from two sources: s in a palatal environment, and the cluster rs, also a synchronic source of [ʂ] when r+s results from combining morphemes. The former includes sk before a front vowel (e.g. ski /ʂiː/, 'ski'), and the clusters sj (e.g. sjø /ʂøː/, 'sea'), and skj (e.g. skjå /ʂoː/, 'shed'). Kristoffersen then raises the question whether speakers in fact distinguish the postalveolar fricatives that derive from these two different sources, for example as [ʃ], from s+palatal, and [ʂ], from rs. Although Larsen (1907) claims the existence of a distinction, which was still made sixty years later by older speakers according to Sivertsen (1967: 79), present-day UEN and North Norwegian would appear to have merged the two. Thus, for Endresen (1985: 77;1991: 54), there is only one postalveolar fricative, which he describes as having apico-postalveolar place of articulation and tongue grooving. Although traditionally designated 'retroflex', there is in general no curling of the tongue tip upwards in the articulation of this sound. The apico-postalveolar constriction is instead achieved by withdrawing the tongue tip into the front of the body of the tongue, which results in anterior bunching and additional narrowing between the lamina and the palatoalveolar region (see Laver 1994: 141). The post-alveolar fricative is also enhanced by lip protrusion, as is the case for /ʃ/ in English and German (cf. Stevens & Keyser 1989).
Now we turn to the phonological distribution of /s/ and /ʂ/. Word-initially before a consonant the distinction between /s/ and /ʂ/ is neutralized (Sibilant Place Neutralization). In general, only lamino-alveolar /s/ is permitted in this environment, as shown in (6) An OT analysis of these facts is straightforward (for introductions to this framework, see Prince & Smolensky 2004;McCarthy 2008). Since there is in general a distinction between alveolar and postalveolar consonants, the faithfulness constraint that requires preservation of the underlying contrast between them, Ident[postalveolar] in (8), must outrank the markedness constraint that penalizes postalveolar consonants in the output, *[postalveolar] in (9). The tableaux in (10) and (11) show how this works for underlying alveolar and postalveolar sibilants in pre-vocalic position, where the contrast is preserved. (12) *[ σ ʂC Assign one violation mark for every segment S, where (a) S is [ʂ], (b) S is in syllable onset position, and (c) S is followed by a consonant.
Tableaux for alveolar and postalveolar sibilants in this position are shown in (13) and (14). Both /sC-/ and /ʂC-/ in the input are mapped to [sC-] in the output, the input ʂC-cluster unfaithfully so. Coalescence also applies preceding /ʂ/, as shown in (21). The process is not vacuous, since the /ɾ/ fails to surface.

Expressive Sibilant Retraction: morpheme or spoken gesture?
The third source of [ʂ], Expressive Sibilant Retraction (ESR), is neither lexical nor phonological. This leaves two possibilities: ESR is either a morphological process or a communicative spoken gesture. ESR may index actions/events, objects and states/properties. One of each is illustrated in examples (22) to (24), with the indexed word shown in bold.
snabel, 'trunk, nose' foː dɛɲ stuːɾ-ɛ ʂɳɑːbaɭ=ɳ diːn ʉːt ɑ ʋɛi=ɛn get.imp dem.m.sg large-def trunk-m.sg.def your.m.sg out of way-m.sg.def 'Get that great nose of yours out of the way!' (24) sprø, 'crazy' hu ɕaɾiŋ=ɑ ɛ ʂpɾoeː she woman=f.sg.def be.prs crazy 'That woman's (utterly) crazy.' We might preliminarily gloss the meaning of ESR as 'intensive', but this is very misleading, since it suggests rather freer distribution than we in fact find. An 'intensive' gloss gives the impression that ESR is a type of modifier, when its essential nature is performative.
In this section I seek to establish the gestural nature of ESR against the alternative view that it is an expressive morphological process. On the gestural interpretation, ESR is a communicative phenomenon that recruits phonetic resources in such a way as to appear to reverse or violate phonological rules. On the second interpretation, ESR interacts with the phonological grammar, overriding phonological neutralization rules. My argument for a gestural account leads into a discussion of the relation between the form and function of ESR and an attempt to identify a core 'meaning' underlying the patterned variation of its interpretations. The morphological account may stipulate this variation, but it is not equipped to explain it.

ESR as a morphological process
If ESR was a morphological process, it would entail that certain "expressive" morphemes could be spelled out after all phonological processes had applied, or that morphological factors could override phonological ones, introducing new contrasts. Neither of these can be squared with modular approaches. Situating ESR in a post-linguistic communicative component avoids this problem.
One case analysed as an example of override is the Javanese 'elative', first brought to the attention of generative phonologists by Benua (1999). The Javanese data in fact provide an interesting contrast, since they appear to involve a partial reversal of a pattern of complementary distribution (allophony), as opposed to neutralization as in the Norwegian case. As described by Benua, high vowels in Javanese are tense in open syllables but lax in closed syllables. Formation of the elative, however, involves tensing the final vowel of the stem regardless whether the final syllable is open or closed, in violation of the canonical allophonic pattern. Benua interprets the tensing of vowels in closed syllables in the elative as a case of 'morphological override': the noncanonical pattern surfaces under compulsion from highly ranked constraints requiring that the elative morpheme be realized (MorphReal). In her parallelist Optimality-theoretic framework, morphological structure is entirely transparent to phonology. An analysis of ESR along similar lines might postulate a floating [postalveolar] feature as the exponent of the hypothetical expressive morpheme. Applying the same logic, MorphReal would then have to outrank *[ σ ʂC from (12) in order to produce the desired result, as shown in the tableau in (25).
We would still have to provide an account of why ESR only ever applies in a sC-cluster, and not just to any word that begins with /s/. This can of course simply be stipulated, for example, by invoking phonologically conditioned suppletive allomorphy (Paster 2006;Bye 2007). On such an account, the floating [postalveolar] featural affix would compete for insertion with a zero allomorph whenever the stem did not begin with a sC-cluster. However, this misses the obvious generalization that ESR specifically alters the output of the Sibilant Place Neutralization rule, while leaving the lexical contrast intact. Yet there is no way to allow the grammar to reflect this. Reanalysing ESR as a communicative rather than a morphological or phonological phenomenon, on the other hand, allows us to see the phonological distribution as part of ESR's design, rather than as an idiosyncratic restriction. The morphological analysis raises other problems of a more general architectural kind. In particular, it does not square with the idea that phonology is ordered following spell-out (e.g. Bye & Svenonius 2012) or that, beyond this, phonology only has limited access to morphosyntactic information (e.g. Selkirk 2011). One way of squaring cases like the Javanese elative with a modular approach is to look for evidence that the contrast is already at least marginally present in the lexicon, rather than derived, so that morphological processes remain structure-preserving. Thus, Bye (2013: 51) argues that, despite the overwhelming restriction of closed syllable tensed vowels to morphologically derived elative forms, there is sufficient leakage of the putatively derived contrast into morphologically simple words. It is possible to make a similar case for ESR as well, although not a strong one. For example, if Larsen (1907: 74) is correct, ESR began as a deviant realization of a single lexical item, or a small group of lexical items, before it was adopted as a general process. 5 Colloquially, s in the adjective ṣvǣr (or švǣr?) ['big'] has a similar, although perhaps somewhat different sound, which has a particular psychological motivation: the adjective stor ['big'] is beginning to ring very flat in this dialect, as in several other eastern dialects, and svaer has been chosen as a substitute. In order to depict the size one literally fills one's mouth, starting the word with one or other sch-sound; likewise now and then with the word svīn ['swine'], in part also in snē ['snow'] and stygg ['ugly'].
The unmarked realization of svaer, 'huge', is [ʂʋaːɾ] in the present variety as well. Two of the other words mentioned by Larsen in the quoted passage, stygg 'ugly' and sne~snø 'snow', evince allomorphy in North Norwegian with ESR, suggesting that these forms are stored along with the postalveolar fricative. In the case of the adjective stygg [styɡ] 'ugly', the ESR form in North Norwegian optionally has a central rounded or front mid rounded vowel instead, giving [ʂʈʉɡ] or [ʂʈoeɡ], '(offensively) ugly'. A similar example is snø, 'snow', which is /snoeː/ in neutral contexts, but may be encountered as /ʂɳyː/ as in (26), when it has the sense of being an inconvenience, because it impedes movement, requires effortful removal, and so on.
(26) nʉ ʋɑ iɲɕoeʂɛɭ=ɳ fʉʎ ɑ ʂɳyː now be.pst drive=m.sg.def full of snow 'Now the drive is full of snow.' In all of these putatively lexicalized cases, the characteristic meaning that ESR introduces is also present, if perhaps not as strongly. This makes the evidence for a marginal lexical contrast weak.

ESR as a spoken gesture
An alternative way to understand ESR, which I argue for here, is that it represents a 'spoken gesture', a term introduced by Okrent (2002). 6 An example would be the iconic use of speech rate to express temporal or spatial extension (Okrent 2002;Feist 2013;Perlman, Clark & Johansson Falck 2015), as shown in (27). (27) It was a looooong time/tail.
of the utterance as a whole. Since they recruit the same vocal channel as spoken linguistic units, spoken gestures cannot strictly be described as 'co-speech ' (cf. Okrent 2002: 188). They may nonetheless be placed with respect to the same techniques of representation and pragmatic functions as manual and facial gestures. Techniques of representation will be dealt with in this section; we return to the pragmatic functions of gestures in Section 4. Kendon (2004) distinguishes between three techniques of representation: modelling, enactment and depiction. Modelling and depiction have in common that they describe objects, while enactment describes actions. Modelling involves the use of a body part to suggest an object's shape, for example, making a 'mouth' or 'beak' with the hand, while depiction entails moving some part of the body, generally the hands, to draw an object. An example would be using both index fingers to trace the outline of a box. In the case of enactment, "the gesturing body parts engage in a pattern of action that has features in common with some actual pattern of action that is being referred to" (p. 160). An example of enactment would be using the flat hand in a chopping motion. The same gesture may be enactive or depictive depending on context. Thus, making a spiral motion with the index finger may depict an object with that shape, or something moving along a spiral path. I argue that enactment seems to provide the best account of the semantics of ESR. Ekman (1997) was the first to point out that facial gestures in conversation serve a communicative function, rather than being directly expressive of emotions. For Kendon (2004: 310), these include "eyebrow movements or positioning, movements of the mouth, head postures and sustainments and changes in gaze direction". Discussing manual and facial gestures, Bavelas, Gerwing & Healing (2014) note that the techniques of representation each employ are typically different. Whereas hand gestures may be used for modelling or depiction, facial gestures generally enact emotional responses, either the speaker's or those of some other individual. Spoken gestures do not obviously allow for modelling as a technique of representation. The iconic use of speech rate mentioned above is at first glance ambiguous between depictive and enactive, since it may be understood as referring to a long object, or enacting something that takes a long time. However, lowering speech rate to 'depict' the length of a long object is probably better understood as a metaphor: the gesture enacts the physical experience of tracing the object along its length.
In any case, the ability to recognize spoken gestures as representing something other than speech depends on access to evidence for the relation between the acoustic signal and the articulatory gestures involved in producing it. In some cases, this evidence may be present visually. More generally, hearers may rely on their phonetic map of the links between auditory experience and proprioceptive feedback from articulation (e.g. Rummer et al. 2014). 7 For example, pitch excursion (up or down) may be used to indicate vertical movement (up or down). The technique involved is enactment, since raising pitch is achieved partly by elevating the larynx, which increases the tension of the vocal folds, while lowering pitch is accompanied by depressing the larynx, which causes greater slackness (Ohala 1978). If laryngeal position can be inferred from pitch by accessing memories in which proprioceptive memories of larynx articulation are cross-modally linked with their acoustic effects, pitch excursion should be available to enact vertical movement, even in the absence of visual evidence. ESR also enacts something which the listener is able to infer from their own experience of operating their tongue as articulator. This semantic core will be discussed in Section 5. Before we get to that point, though, we must consider what interpretations ESR has in context. This is the subject of the next section. Kendon (2004; distinguishes five pragmatic functions of conversational gestures, two of which are relevant here: referential and modal. A referential gesture bears on the proposition expressed in the utterance, while a modal gesture enacts a response: generally, but not necessarily, that of the speaker. An example of the latter would be the facial shrug (Bavelas, et al. 2014), which signals personal disengagement (Debras 2017). The function of ESR is, I claim, essentially modal, although something close to referential interpretations may arise in context.

The interpretations of Expressive Sibilant Retraction
In the most general sense, ESR serves to amplify the performance of an utterance. It generally signals heightened engagement or vehemence, with one exception described in Section 4.4, where we discuss ESR as a gestural marker of 'reflective distance'. Within the 'heightened engagement' context it is possible to distinguish two more specific meanings, one associated with actions or events, the other with objects. ESR indexes actions as carried out with accelerated movement or impressive force, and objects as obtrusive. In quasi-referential terms, these meanings resemble, respectively, adverbial and adjectival modifiers. However, this is not quite right, and is perhaps fundamentally wrong. The essentially modal nature of ESR consists in that it is a performance of acceleration/force or obtrusive presence, not an intensive modifier. Actions/events are dealt with in Section 4.1, and objects in 4.2. Section 4.3 deals with states and properties.
Because of its use in expressive contexts, ESR is difficult to elicit in a controlled way. The examples presented draw on observations made in over twenty years of living and working as a linguist in North Norway and represent a combination of spontaneously heard utterances, examples elicited during interviews with native speakers, and test utterances devised by me in order to test the limits of its use. All examples have been checked with native speakers of North Norwegian in my circle.

Actions and events: acceleration
ESR indexes actions to give the impression of accelerated movement or forcefulness. The examples in (28) convey, even dramatize, a build-up of energy prior to the action and, by implication, the speed, force or vigor of its execution. In addition to heightened engagement, the valuation communicated is generally that the action is impressive in some way, but not necessarily either positive or negative. ESR is also possible with verbs of the 'break' class (see Fillmore 1970), if not as readily. The implication is nevertheless again that accelerated movement is involved in producing the result, increasing the impact.
(30) a. smadre, 'smash' so tɑːɾ ɑn o ʂmɑdɾ-ɑ ɡlɑs|rʉːt=ɑ then take.prs he.sbj and smash-prs glass.pane=f.sg.def 'The he goes and smashes the glass pane.' b. splitte, 'cleave, split' hɑn ʂplit-ɑ veː|kʉb=ɛn he.sbj split-pst log=m.sg.def 'He split the log.' 'Break' verbs encode the result, but not the manner, since glass may be smashed by an ultrasonic device, and logs split with a laser. Native speakers rejected my attempts to elicit equivalent examples to (30) in which the instrument was spelled out as one of these. It seems to be a strong implicature that contact by impact is involved, which is presumably the reason the utterances in (30) are acceptable.
Explicit representation of the agent is not necessary for the felicity of ESR, since it is also possible with the passive construction. The examples in (31)  In at least a few cases, acceleration may not be a strong implicature, at least objectively speaking. It is initially surprising to find examples such as those in (35), involving 'bumping' and 'grazing', judged as felicitous with ESR. As in (28), these are contact-by-impact verbs, but they are likely to imply that the contact was accidental or unintended. ESR dramatizes these eccentric movements as somehow brought about or impelled to go off course, whether on purpose or not. The examples in (39) show that iterative semantics are also compatible with ESR, where the verb encodes more than one cycle of accelerated movement.
(39) a. skrubbe, 'scrub' a ɭoː daːɾ o ʂkɾʉb-ɑ ɡɔɭʋ=ɛ I.sbj lie.pst there and scrub-pst floor=n.sg.def 'I lay there scrubbing the floor.' b. stavre, 'make one's way laboriously' hɑn ʂtɑʋɾ-ɑ sa up tɾɑp=ɑ he.sbj move.laboriously-pst 3.refl up stairs-f.sg.def 'He made his way laboriously up the stairs.' In the following cases in (40), ESR fails to enact an accelerated movement, and so is infelicitous. The actions designated are presumably of insufficient magnitude for an acceleration phase to have salience.
ESR is also conventionally present in the verb steike 'roast' in the common oath [daːʋɛn hɑn ʂʈɛikɛ] Daeven han steike! 'The Devil, (may) he roast!' However, this usage does not appear to generalize to other contexts.
As a final point, verbs relating to suddenly perceived offensive smells may be realized with ESR. Examples are shown in (43).
The felicity of these examples can be understood with reference to the finding by Digonnet (2018) that the experience of an obtrusive smell is commonly understood in terms of a conceptual metaphor of invasion, cf. the examples in (34).

Objects: aversion and rejection
ESR indexes an object as especially noticeable, usually unwelcomely so. The object may be inconvenient, in the way, and liable to induce aversion, rejection, occasionally awe.
Consider the examples in (44).

States and properties: heightened engagement
ESR is frequent with compound adjectives whose first component independently has intensive force. Examples are given in (46). Here, ESR enacts increased engagement (arousal, surprise) or commitment of the speaker. The accelerative and aversive interpretations are absent in these cases. The implied valuation may be positive or negative.

Evaluations of states of affairs: Reflective distance
Finally, I will illustrate uses of ESR which seem to lack the meaning of heightened engagement. The basic meaning in such cases seems to be a reflexive gesture to the speaker him/herself, such that attention is diverted away from the speaker's positive evaluation of some state of affairs. This introduces a note of reflection, distance, self-consciousness, sometimes irony or even "heteroglossia". Since ESR is a performance, it allows for the inference that it is an enactment of an utterance by someone else. Examples (48a) to (48c) might be taken to be self-conscious compliments that soften the impression of any emotional involvement, while (48d) could be intended ironically.

Identifying a semantic core for ESR
In their accounts of ESR in UEN, both Larsen (1907) and Broch (1927) attempt to explain the link between ESR's distribution and its expressive function. Larsen was the first to propose that ESR may be related in some way to the non-expressive, phonologically regular retraction of /sl/ to [ʂɭ] (Prelateral Sibilant Retraction) illustrated in (7) above, which is also characteristic of this variety (see Haugen 1942;Jahr 1985;and Kristoffersen 2000: 102ff. for recent discussion). Broch follows him in this assessment, but the two differ in how they see ESR acquiring its meaning. While Larsen sees a role for sound symbolism or articulatory feedback, Broch (1927: 155f.) favors a social constructionist account. Broch proposes that ESR be understood as the generalization, to sC-clusters, of Prelateral Sibilant Retraction, which was a salient group marker of the Oslo working class at the time he wrote. Broch claims that it is this, rather than anything sound symbolic, that is exploited in the expressive extension of the postalveolar fricative to other word-initial preconsonantal environments. The North Norwegian facts cast doubt on both these accounts, since they indicate that the use of ESR is highly constrained even when phonological factors are taken into account. Broch's account in particular does not lead us to expect to see the restrictions that we do. ESR is not a general-purpose intensive, but affords a small range of context-dependent meanings. This fact I argue is best explained by a gestural account.
In Section 4 I showed how ESR may index actions/events, objects, states/properties, and evaluations of states of affairs. The interpretations that attach to each differ. With the exception of the 'reflective distance' interpretation discussed in Section 4.4, these meanings have in common that they signal heightened engagement on the part of the speaker. With states, this seems to be the only meaning. With actions, however, ESR conveys an impression of acceleration or force, and with objects, a sense of obtrusive presence. These interpretations would furthermore appear to be complementary. For example, certain actions, such as those expressed by the verbs shown in (49), may characteristically trigger aversion, and yet ESR is not possible here.
(49) kuɾ dʉ snuɾkɑ / snʉfsɑ / snyɭʈɑ (! ʂnuɾkɑ/ ! ʂnʉfsɑ/ ! ʂnyɭʈɑ) 'How you snore / sniff / scrounge!' (intended meaning: obtrusively) The complementarity raises the question whether these meanings of ESR are conventionalized separately for each context, or whether it is possible to identify a semantic core underpinning them all. What follows is somewhat speculative, but I will argue that this is possible. 10 Since spoken gestures utilize the same channel as spoken linguistic units, our analysis of the meaning should consider how the gesture relates to speech norms as well as its inherent properties, making the question of a semantic core a two-dimensional one.
First, ESR constitutes a deviation from a particular communicative norm, in this case, conformity with the phonetic targets given by Sibilant Place Neutralization. An adequate account of ESR must be able to relate its expressive function to the fact that it may only apply where the distinction between alveolar /s/ and postalveolar /ʂ/ is neutralized. As mentioned above, a verb form like [sɑːɡɑ] saga, 'sawed', may not undergo ESR to give ! [ʂɑːɡɑ], with the intended meaning 'sawed forcefully'. ESR also never applies word-internally, where before a consonant the contrast between /s/ and /ʂ/ is largely preserved, e.g. /ʋast/, 'waistcoat' vs. /ʋaʂt/, 'worst'; /bɑsk/, 'flail, flap (imp)' vs. /bɑʂk/, 'inhospitable, hardened'. I argue that ESR's very deviation from the targets given by Sibilant Place Neutralization momentarily directs attention to the non-linguistic gesture, signaling an intent on the part of the speaker to communicate something of heightened significance. 11 Second, we must consider the intrinsic properties of the gesture itself. The most straightforward possible interpretation of retraction of the tongue tip is that it enacts backward movement or withdrawal, for example, of some other part of the body, in much the same way that raising or lowering the larynx may iconically enact upward and downward movement of another object. Additional plausibility for this claim comes from recent evidence of a neutrally encoded congruence in the direction of manual and mouth actions, such that backward hand movements are preferentially associated with retraction of the tongue (Vainio et al. 2018). See also Sidhu & Pexman (2017) for further relevant discussion.
It is possible that the accelerative and obtrusive interpretations that attach to actions/events and objects derive from this more basic meaning of 'withdrawal'. Retraction of the tongue tip may enact preparatory movement for using, say, the hand or arm in a throwing or striking action. The accelerative interpretation would thus result from a metonymic connection between the biomechanical phases of movement. When it indexes an object, ESR does not map onto an action or event in the world. Under this condition, ESR may instead enact rejection behavior in the speaker with respect to the indexed object, or perhaps withdrawal in aversion. In this context, negative implicatures become highly relevant in a way that they do not with actions/events or states. If ESR enacts physical repulsion of an obtrusive object, it is possible to see how this meaning derives from the accelerative one by adding a further metonymic connection.
With a state or property, neither the accelerative nor obtrusive interpretation is relevant, with the result that the interpretation defaults to one of 'heightened engagement', which is implied in both the accelerative and obtrusive interpretations. Preparatory movement, preparation to engage in the world in some unspecified way, may thus be the first link in the chain.
This leaves the 'reflective distance' interpretation discussed in 4.4. In this case, ESR does not seem to index a particular object or predicate, but a speaker's positive (but self-distancing) evaluation of a state of affairs. It is possible that tongue-tip retraction enacts no more than a withdrawal, which may lead to inferences that the speaker is distancing themselves from what they are saying, engaging in reflection, irony, and so on. Withdrawal, then, may constitute the most basic meaning of the ESR gesture: tongue tip withdrawal iconically enacts the withdrawal of some other part of the body. Table 1 summarizes the proposed relationships between context (state of affairs, state/property, action/event, and object) and meaning, and the relation between the hypothesized core iconic meaning of ESR, its metonymically derived context-dependent meanings, and additional implicatures that may derive from these.
At least one key question remains unanswered, however. It is striking that for each type of context, it is the most specific meaning that is required. Thus, when ESR indexes actions/events and objects, the meaning obtained is never simply heightened engagement, although there is nothing in what we have said that rules this out. Actions/events must trigger the accelerative interpretation, and objects must trigger the obtrusive one (which I have argued here may be the enactment of motor repulsion, which would include within it the enactment of accelerated movement). I leave the resolution of this issue to future research.

Conclusions
Alveolar /s/ and postalveolar /ʂ/ contrast in North Norwegian, but the contrast is neutralized word-initially before another consonant (Sibilant Place Neutralization). In an apparent reversal, or violation, of the phonological rule, Expressive Sibilant Retraction (ESR) maps /s/ onto the corresponding postalveolar fricative [ʂ] in expressive contexts in precisely the environment where neutralization applies. Since ESR adds meaning, it is tempting to analyse it as a morphologically derived marginal contrast. In this paper, however, I have argued that this phenomenon is not linguistic, but communicative, and best understood as a 'spoken gesture' (Okrent 2002). ESR nevertheless exploits deviation from canonical phonological structure to draw attention to the tongue-tip retraction gesture. The core meaning of ESR proposed here is 'withdrawal', which gives rise to more specific interpretations depending on whether the gesture is used to index an action/event, object, state/property, or state of affairs. This paper provides an account of the relationship between these interpretations as well as the relationship between its form and function, substantially adding to the early accounts of more or less the same phenomenon in Oslo Norwegian by Larsen (1907) and Broch (1927). My hope is that this account of ESR may be a possible model for describing and explaining other cases of "expressive phonology" in other languages by enriching our understanding of spoken gesture and the relations between linguistic and post-linguistic communicative processes.