The strength of morphophonological schemas: Consonant mutations in Polish

Selective elimination of consonant mutations in Polish provides evidence supporting construction-based sublexicons and morphophonological schemas extracted from them. Morphophonological schemas exhibit various strength depending on their type frequency, they refer to morpheme-specific classes of segments and their impact is continuously mediated by paradigm uniformity pressures. A low-frequency and a high-frequency pattern are analyzed: agent nouns in -ist-a/-yst-a and diminutives in -ek . Two kinds of frequency are important predictors of pattern modification: type frequency and token frequency. As for the impact of type frequency, the less frequent a pattern is in the lexicon, the more susceptible it is to modifications promoted by paradigm uniformity. Schemas are ranked based on the type frequency of the morphological patterns they encode and interact with paradigm uniformity constraints. Schemas representing less frequent patterns are outranked by paradigm uniformity constraints and are thus more likely to be modified than those representing more frequent patterns. Regarding token frequency, the greater stability of high-frequency words than comparable low-frequency words is linked to their strong representations and a constraint promoting the use of stored representations. The continuous effect of paradigm uniformity effects gets support from the results of a nonce-word experiment.


INTRODUCTION
Usage-based linguistics has provided ample evidence of the pervasiveness of frequency effects in language processing (see Bybee 2001 andEllis 2002 for reviews).In contrast, frequency of use did not play a significant role in mainstream generative phonology until recently.It was partly due to the common assumption that words are generated on-line from their component parts (i.e.morphemes) (e.g.Chomsky & Halle 1968;Kiparsky 1982).Token frequency (i.e. the usage frequency of particular words) is an inherent property of whole words and the assumption that the words are generated through the concatenation of abstract morphemes makes it impossible to associate frequency values with stored representations.In classical generative approaches, phonological analyses lack the right tools for differentiating frequent from rare words.As a result, the observation that some (morpho)phonological processes apply or fail to apply to words based on their frequency either had to go unnoticed or be treated as due to extragrammatical factors.The advent of transderivational approaches using output-output constraints (Burzio 1996;Kenstowicz 1996;Benua 1997;Steriade 2000) within Optimality Theory (OT, Prince & Smolensky 1993/2004) made the task of representing frequency in the grammar easier.In these approaches, frequency values, which are ascribed to independently stored words (including morphologically complex words), can be used to account for the application or blocking of a process in a particular word or group of words.
In usage-based models morphological and phonological patterns (including segmental alternations) are represented in terms of schemas (Bybee 2001, Dąbrowska 2004).An example of a schema specifying English Past Tense in verbs like stopped, begged and wanted is given in (1) (Bybee 2001: 126).
(1) a Past verb ends in [t], [d], or [ɨd] Schemas are morphologically conditioned and their strength is a function of the frequency of the pattern they encode.The finding that the type frequency of a morphological pattern, that is, the number of words the pattern applies to, determines its productivity is referred to as gang effects (McClelland & Elman 1986;Stemberger & MacWhinney 1988;Alegre & Gordon 1999).The higher the number of words that adhere to a given pattern (i.e. the larger the gang), the more likely the pattern is to become extended to novel words.The type frequency of the schema in (1) corresponds to the number of English verbs (the size of the gang) that form their past tense using [t], [d], or [ɨd].
In addition to evidence pointing to the important role of type frequency, there is also evidence that token frequency, that is, the frequency with which a word is used, is relevant in pattern generalization. 1 High-frequency words are more likely to undergo phonetic reduction, while lowfrequency words are more prone to analogical leveling (Mańczak 1980, Bybee 2001).Frequency plays a crucial role in dual-route models of lexical access (McQueen and Cutler 1998;Hay 2003;Plag 2012).A morphologically complex word can be accessed via the whole-word route (i.e. by accessing its stored whole-word representation) or the decomposed route (i.e. by accessing its component morphemes).The choice has been shown to depend on the relative frequencies of the derivative and its base.For example, in English the derivative business has a much higher frequency than its base busy.This entails that business is more likely to be accessed via the whole-word route. 2 Conversely, blueness is used less frequently than its base blue and, therefore, is predicted to show an advantage for the decomposed mode during access (Plag 2012;but cf. Hahn & Nakisa 2000, who, on the basis of German plurals, argue that dual-route models make the wrong predictions).
In the present analysis of consonant mutations in Polish, both type and token frequency are shown to have an impact on paradigm uniformity effects.Mutations are eliminated from a lowfrequency morphological pattern, agent nouns in -ist-a/-yst-a, while a high-frequency pattern, diminutives in -ek, remains stable.It is argued that high-frequency patterns resist paradigm uniformity pressures due to their robust representations in the grammar.Low-frequency 1 I follow Bybee (2001) and others in distinguishing type from token frequency effects.Arguably, type frequency effects can in principle be derived from the sum of token frequencies.

2
The fact that business is synchronically only weakly associated with its base busy is fully consistent with the predictions of dual-route models of lexical access.Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 patterns, on the other hand, are represented with low-ranked schema-constraints and, thus, are more susceptible to modifications.It is also demonstrated that high frequency words (i.e.words with a high token frequency) show increased stability even if the pattern (schema) they represent displays a low type frequency. 3  The paper is structured as follows.Section 2 provides the main data and introduces the basic elements of the analysis: frequency in phonology, allomorphy, schemas and cophonologies.It also discusses some representational aspects of consonant mutations in Polish.In Section 3, the formation of agent nouns in -ist-a/-yst-a, a low-frequency pattern, is discussed.This section focuses on the selectivity of consonant mutations.First, we look at the phonological conditioning responsible for the emergence of the pattern.Second, modern-day complexities resulting in the elimination of mutations are accounted for using a combination of phonological constraints and morphophonological schemas.This section also reports the results of a nonceword experiment, which point to an on-going change driven by paradigm uniformity.In Section 4, we consider diminutives in -ek, a high-frequency pattern which results in stable mutations.Section 5 offers a discussion of the main implications of the analysis and Section 6 provides the conclusions.

DATA AND BASIC ASSUMPTIONS 2.1 A LOW-FREQUENCY PATTERN: AGENT NOUNS IN -IST-A/-YST-A
The suffix -ist-a/-yst-a [ista/ɨsta] is used to form agent nouns in Polish.It is of Latin/Greek origin and entered Polish at the turn of Old and Middle Polish, that is, around the 15 th century (Długosz-Kurczabowa & Dubisz 2006: 369).The suffix is productive and no longer restricted to Latinate words.In spite of its long history in Polish, the suffix remains a low-frequency pattern.The quantitative data to back this claim will be given in Section 2.3.We start out with some facts about the distribution and consonant mutations triggered by the two variants of the suffix: -ist-a and -yst-a.I begin with examples illustrating the usage of the variant -ist-a.The data are drawn from Gussmann (2007: 157-161).For the purposes of this analysis, a consonant mutation is defined as a featural change in a consonant that results in a phonologically distinct (i.e.contrastive) segment.There is convincing evidence that palatalization of labials, dentals and velars before [i], [pʲ bʲ fʲ vʲ mʲ tʲ dʲ sʲ zʲ c ɟ ç], is a non-contrastive, coarticulatory effect of the following vowel.First, labials, dentals and velars do not contrast for palatality preconsonantally and word finally.Second, Święciński's (2014) acoustic study shows that [pʲ bʲ fʲ vʲ mʲ tʲ dʲ sʲ zʲ c ɟ ç] appear exclusively before the vowel [i]; before other vowels palatalization is manifested on a distinct segment, the palatal glide [j], e.g.piasek [pjasɛk] 'sand ', diabeł [djabɛw] 'devil ' and kiosk [kjɔsk] 'kiosk'.In light of this, [pʲ bʲ fʲ vʲ mʲ tʲ dʲ sʲ zʲ c ɟ ç] should not be regarded as mutated variants of [p b m f v t d s z k g x].Such coarticulatory palatalization is omitted from the transcription in this paper, as it does not represent an instance of mutation in the relevant sense. 4  3 A reviewer points out that it is in principle possible to extend the classical generative model in such a way as to associate frequency values directly with the morphemes which constitute the respective words.While an extension along these lines definitely deserves more attention, it does not discount the basic insight.Token frequency is a property of individual words, whereas type frequency is a property of an aggregate of words complying with a particular morphological pattern.In addition, while effectively modeling the effects of type frequency of affixes, such a solution might encounter problems with modeling the effects of the token frequency of complex words.In blueness both blue and -ness have a high token frequency (separately), but blueness as a complex word has a low token frequency.Insofar as blueness patterns with other words of low frequency, its behavior cannot be derived from the frequencies of its component morphemes.It must be derived from the token frequency of the entire word.Thus, in addition to the frequency of the component morphemes, such a model would still have to refer to the token frequency of the word.

4
A reviewer claims that labials do in fact contrast for palatality in Polish and refers to the well-known morphological regularity that involves the usage of two different case endings for stem-final labials.For example, the loc sg has two possible endings -u and -e.The loc sg of karp [karp] 'carp ' selects -u, i.e. karpi-u [karpj-u], while the loc sg of sęp [s ɛmp] 'vulture ' selects -e, i.e. sępi-e [sɛmpj-ɛ].While it is true that the choice of the endings was historically motivated by the contrastive specification of labials (palatalized vs. non-palatalized), the synchronic grammar of contemporary speakers need not reflect this contrastive specification.In the usage-based approach adopted here the solution is simple.The two patterns of behavior of labials in morphology are represented by two competing schemas.For example, for base-final [p] the schemas representing loc sg are For novel words, the competition between the two schemas is resolved in favor of the more frequent one.For example, the loc sg of laptop [lapt ɔp] 'laptop' is laptopi-e [laptɔpj-ɛ], not *laptopi-u, suggesting that […p] ↔ [[…pj]ɛ] is the more frequent and hence productive one of the two.As for well-established words, given that frequent words are stored whole, the competition between the two endings is resolved in favor of the stored one.See Section 2.5 for the details of the proposal.The other variant, -yst-a [ɨst-a], is selected after base-final [r].The rhotic alternates with the fricative /ʐ/ in the derived form, as illustrated in (3a).A context-free diachronic change /rʲ/ → /ʒʲ/ → /ʐ/ gave rise to this alternation in modern Polish.As exemplified in (3b), the variant -yst-a [ɨsta] also appears after alveolar and postalveolar (retroflex) affricates [ts dz tʂ dʐ] and the postalveolar (retroflex) fricatives [ʂ] and [ʐ], the latter fricative showing a similar behavior to the /ʐ/ ← /rʲ/.

5
The words lobb-yst-a 'lobbyist' and hobb-yst-a 'hobbyist' might seem exceptional to the distribution of the suffix alternants after labials.In fact, they are cases of "morphological absorption" (the term is commonly attributed to Mikołaj Kruszewski), where the stem-final vowel was reanalyzed as part of the suffix, i.e. lobby > lobb-yst-a and hobby > hobb-yst-a.

6
Admittedly, the influence of such product-oriented schemas as "agent nouns end in […l-ista]" (see the discussion in Section 2.5) merits a closer look, but it would take us too far afield.In addition to the words with coronals [t d r] that show mutations before -ist-a, illustrated in and ( 2) and ( 3), there are also those which do not evidence consonant mutations and choose the other shape of the suffix, as exemplified in (4).The distribution of the suffix alternants in ( 4) diverges from the distribution shown in ( 2) and ( 3).
(  3), on the one hand, and ( 4) and ( 5), on the other, will be linked to differences in their token frequency in Section 3.3.2. (5) 'follower of mannerism' As the summary of the distribution of the suffix -ist-a/-yst-a in (6) shows, the -ist-a variant appears after labials and velars, while -yst-a occurs after [ts dz tʂ dʐ ʂ ʐ].Base-final coronals [t d r] show a more complex pattern: both allomorphs are attested.Based on these distributional facts, it is assumed that -ist-a is the basic variant of the suffix, while -yst-a, whose usage is restricted to retroflexes and alveolar affricates (categorically) and [t d r] (variably), is a positional variant of the suffix.In usage-based models, it is assumed that relations between morphological units, including identity relations, are established on the basis of phonological and semantic similarity.
The suffixes -ist-a and -yst-a are both phonologically and semantically similar.A rationale for this assumption is provided in Section 2.5, where the properties of schemas are fleshed out.
We now turn to evidence indicating that consonant mutations before front vowels are historically and synchronically motivated in Polish.As for the historical motivation, there is compelling evidence that CV coarticulation preceded and induced the emergence of distinctive palatalization in Slavic languages.Jakobson (1929Jakobson ( /1962: 71ff.) : 71ff.) and Andersen (1978: 11-15) argue that in Common Slavic palatalized consonants arose due to coarticulation with the following front vowel.consonant. 7When the vowel became back, as in [vʲara] and [klʲatka], the palatality could no longer be attributed to the vowel and was phonologized on the consonant, which gave rise to the distinctively palatalized /vʲ/ in /vʲara/ and /lʲ/ in /klʲatka/ (Andersen 1978: 11-15).A similar mechanism was likely involved in the development of mutations before -ist-a/-yst-a.Thus, phonetic reconstruction and comparative evidence indicate that palatalization is historically motivated before front vowels in Polish.
(7) Common Slavic věra > vʲɛra > vʲɛara > vʲara > Modern Polish vjara Common Slavic klětŭka > klʲɛtka > klʲɛatka > klʲatka > Modern Polish klatka Synchronic alternations in Modern Polish also provide motivation for mutations before front vowels.Suffix-initial front vowels commonly trigger consonant mutations (Rubach 1984).In (7) mutations before the adjectival suffix -ist-y are shown.Returning to the formation of agent nouns in -ist-a/-yst-a and the behavior of the base-final consonant, three facts require explanation in the context of Coronal and Velar Mutations.First, in considering the words in -ist-a/-yst-a, labials and velars do not show any significant consonant mutations before the suffix, while coronals (variably) do.This stands in contrast to the data exemplified in (8) showing that mutations apply to both coronals and velars in other morphophonological patterns. 9The failure of velars to undergo mutations before -ista/-yst-a is in this light surprising and will be addressed in Section 3.1.Second, why do some words with base-final [t d r] show mutations before -ist-a/-yst-a, while others do not?Looking at the data illustrating Coronal Mutation, it seems safe to assume that mutations of coronals before front vowels are historically and phonetically motivated in Polish.There is evidence that palatalization before -ist-a in words with base-final [t d r] used to be a fully regular process.Words of this type without palatalization appeared later (Rubach 1984: 65-68).Therefore, the lack of mutations in this context for some words with base-final [t d r] before -ist-a/-yst-a must be seen as a later development, whose emergence is in need of explanation.This issue is tackled in Section 3.2.3.The third fact that requires explanation is the gradient elimination 7 The loss of jers, ultra-short vowels, is another factor implicated in the phonologization of palatalization in Late Common Slavic (Jakobson 1929(Jakobson /1962)).

8
I refrain from using the terms Coronal Palatalization and Velar Palatalization (see, for instance, Rubach 1984), as the outcome of the processes is not always easily classifiable as involving a palatal articulation (see Section 2.5).There are several types of Velar Mutation (Rubach 1984).9 Bateman (2007) in her survey of 45 languages or dialects found that palatalizations of coronals and dorsals are common, although the former type occurs more frequently than the latter: 54% for coronals vs. 18% for dorsals in her sample.Insofar as typological asymmetries in segmental alternations reflect common pathways of change, this indicates that both coronals and dorsals are susceptible to palatalization when followed by front vowels.Labials and coronals in (10a) and (10b) fail to mutate.In the case of palatalized labials and coronals in (10c), the palatal element disappears before -ek, which can be described in traditional terms as depalatalization. 10Velars in (10d) mutate and appear as their postalveolar (retroflex) reflexes before the [ɛ] of the suffix.
There is inconsistency in the applicability of consonant mutations before the suffixes -ek and -ist-a (the latter discussed in the previous section).In contrast to the behavior of the -ist-a suffix, which triggers mutations of coronals but not of velars, the diminutive suffix -ek results in mutations of velars but not of coronals.There is ample evidence that the high front /i/ is typologically more likely to trigger mutations than the mid front /ɛ/ (Bateman 2007;Rubach 2007).In fact, phonological conditioning alone predicts that -ist-a/-yst-a should be responsible Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 for higher rates of mutations than -ek, as mutations are more likely to occur in the context of /i/ than in the context of /ɛ/.In a similar vein, Czaplicki's (2019) quantitative analysis of the effects of 27 suffixes in Polish has shown that /i/ is far more likely to trigger mutations than /ɛ/.The fact that the suffix -ek triggers palatalization before velars, but depalatalization before labials and coronals has been interpreted by Czaplicki (2019) as evidence for the irrelevance of phonological naturalness in the conditioning of consonant mutations.What is relevant for our purposes is that words in -ek do not exhibit variability, as opposed to words in -ist-a.There are no variants without mutations for base-final velars before -ek.Mutated coronals before -ek are equally uncommon. 11It is shown that an element that is crucial in an explanatory analysis of morphophonological patterns is their frequency.Specifically, it will be shown that type frequency determines the stability of a pattern.
(11) Relation between type frequency and morphological stability -ist-a -low type frequency -decreased stability -ek -high type frequency -increased stability

FREQUENCY
There is growing evidence suggesting that frequency plays an important role in morphophonology (Mańczak 1980;Bybee 2001;Ellis 2002;Albright & Hayes 2003;Baayen et al. 2003;Dąbrowska 2008;Czaplicki 2013a;2013b;2014a;2014b).As already mentioned, the generalizability of a pattern has been shown to crucially depend on the number of stored words that exhibit the pattern (gang effects).More specifically, when two (or more) patterns are available in a particular context, the pattern with a higher frequency is the one most likely to become generalized to novel words.In other words, pattern extension deploys the most robust of the several patterns used in a particular morphological context.An explanation along these lines is available for the change of classes of the English verb help, which in Old English belonged to Strong Verbs, but now forms its past tense by means of the more robust -ed suffixation.Frequency enters into interactions with other factors.Dawdy-Hesterberg & Pierrehumbert (2014) have found that, while the generalization of the various patterns of Arabic broken plurals in large part depends on prosodic templates, gang effects are important predictors as well.
Another factor that is often implicated in language change and pattern extension is token frequency (Bybee 2001).High-frequency words are predicted to resist pattern extension, as evidenced by the preservation of such suppletive forms in English as go -went.As already mentioned, the relative token frequency of the derivative and its base plays a critical role in dual-route models of lexical access.This issue becomes relevant in Section 3.2, where the strength of lexical representations is considered.Anttila (2006) discusses an interesting case of assibilation in Finnish.The bimoraic verbs which meet the structural description of assibilation (context of a following /i/) show three types of behavior that can be directly linked to their frequency: the most frequent verbs exhibit assibilation, the least frequent verbs fail to undergo assibilation and verbs of medium frequency show variation.Czaplicki (2016) argues that these regularities can be insightfully explained by reference to frequency, on the one hand, and avoidance of mutations between the base and the derivative (output-output correspondence, formulated below), on the other.
In an attempt to explain the different behavior of words in -ist-a and -ek with respect to mutations, I make reference to the frequency of the two patterns in the lexicon.It is argued that for robust patterns (i.e.those showing a high type frequency) identity pressures are overridden.Let us compare the type frequencies of the words in -ist-a and -ek using dictionary and corpus data.Table 1 presents the counts of words in -ist-a and -ek drawn from a reverse dictionary of Polish (Indeks a tergo do Uniwersalnego słownika języka polskiego pod redakcją Stanisława Dubisza) (Bańko et al. 2003).The number of dictionary entries in -ek is nearly twice as high as the number of dictionary entries in -ist-a.There is good reason to believe that the difference in robustness between the two patterns is even greater.First, the list of words with the suffix -ek is not exhaustive, as it does not include recent borrowings and many diminutives whose semantics is fully predictable.Grzegorczykowa & Puzynina (1999: 425) mention that diminutives in -ek and -ik constitute an open class and that most nouns can form diminutives using one of these suffixes.12Second, a considerable number of words in -ist-a (but definitely not all) belongs to the learned stratum of vocabulary and their usage is often restricted to formal and technical registers.Therefore, the factor that is missing from Table 1 is token frequency, i.e. the frequency with which each of the words is used in the discourse.
In analyzing the frequency of the two suffixes, I use data extracted from the corpus plTenTen: Corpus of the Polish Web, available in Sketchengine, which is made up of texts collected from the internet in 2012 and comprises more than 7.7 billion words.Table 2 provides the type and total token frequency of words in -ist-a and -ek in the corpus (accessed 12 October 2020).
Words below the frequency of 50 have not been considered, as they turned out to be mostly proper names and spelling errors. 13  The type frequency of words in -ek in the corpus is 7.6 times higher than the type frequency of words in -ist-a.Predictably, the difference between the number of words in -ist-a and -ek in the corpus (i.e.their type frequency) is greater than for the data taken from the dictionary shown in Table 1.Table 2 also shows the total token frequency, which is the sum of the token frequencies (the frequency with which a given word is used) of all the words in -ist-a and -ek in the corpus.The total token frequency of words in -ek is 9.2 times higher than the total token frequency of words in -ist-a.
The token frequency data are not normally distributed, so they have been log-scaled as in This result is likely due to the highly positively skewed distribution for words in -ek (skewness = 29.44).As can be seen from the histogram on the left in Figure 1, low-frequency words in -ek outnumber comparable words of medium and high frequency.It follows that type frequency is a better measure of the strength of a pattern than token frequency.
13 The necessary condition for words to be included in the data was the existence of a base as an independent word.Words like art-yst-a 'artist', dent-yst-a 'dentist' and stat-yst-a 'extra' did not qualify because art, dent and stat are not actual words in Polish.This decision was partly dictated by the gradient and often debatable decomposability of such words with bound roots.With relevance to the present analysis, the lack of a base makes it impossible to assess the relative token frequency of the derivative and the base.Relative frequency is instrumental in determining the route of lexical access (See section 3.2.2).Furthermore, considering the assumptions of usagebased models, it would be meaningless to analyze the presence or lack of mutations in a derivative that has no independent base, such as stat-yst-a.Though, admittedly, such words would be central in an analysis of the gradience of semantic decomposability.Complex words without a shared independently occurring base like klasycyzm 'classicism' and klasyc-yst-ycz-n-y 'classicist' (where klasyc is not an actual word) are associated by means of second-order schemas (Booij & Audring 2017: 12-15).In this way, complex words form relations with multiple other complex words by virtue of the shared elements.A reviewer worries that the schema-approach might have problems with the fact that while words with bound roots like ar[t]-yst-a 'artist' and ar[t]-yst-ycz-n-y 'artistic' exist, a word like ar[tɕ]-ist-ycz-n-y (with a mutation) is unlikely.The proposed explanation makes use of (i) whole-word storage and (ii) second-order schemas that join elements of related complex words.Bound roots like art-are not independently stored, so a source-oriented schema such as: "a final [t] in the base corresponds to [tɕ] in the adjective in -ist-ycz-n-y" is not applicable.There is no independently stored base (see Section 2.5).To sum up, a close analysis of type and token frequency confirms the difference in the robustness of the two patterns: the pattern -ek is significantly more robust than the pattern -ist-a in the grammar.In addition, low-frequency words in -ek are overrepresented relative to words in -ek of medium and high frequency.

TYPE FREQUENCY TOTAL TOKEN FREQUENCY
The two suffixes are useful in assessing the role of frequency, as they are compatible in the relevant aspects of distribution and phonological behavior.First, they are both derivational.Second, they are both fully productive and readily extended to new words.Third, neither of them is restricted to loanwords.14Finally, they both begin with front vowels and trigger mutations.In fact, as mentioned in Section 2.2, phonological conditioning alone predicts that -ist-a should be responsible for higher rates of mutations than -ek, as /i/ is more likely to trigger mutations than /ɛ/ both in Polish and more generally (Bateman 2007;Rubach 2007;Czaplicki 2019).It is claimed that the differences in the phonological and morphological behavior of the two constructions are derivable from the differences in their frequency.

LEXICAL STORAGE OF ALLOMORPHS
Lexical storage of allomorphs is not a new idea.There is ample evidence from usage-based research that morphologically complex words are stored whole (McQueen & Cutler 1998;Bybee 2001;Baayen et al. 2003;Hay 2003; see also Section 3.2.2).Similarly, certain analyses representative of generative phonology demonstrate that specific alternations cannot be derived using purely phonological operations and the relevant allomorphs need to be listed.
In phonologically conditioned suppletion two phonologically dissimilar alternants need to be stored, however, their distribution is phonologically conditioned (Carstairs 1988;1990).Anderson (2008) discusses a pattern from Surmiran and argues that vowel reduction, once a phonologically conditioned process, has become opaque.A solution proposed by Anderson involves reference to two listed alternants of the stem whose distribution is governed by prosodic considerations (i.e.stress).The distribution of alternants may also be regulated by phonologically neutral considerations.Paster (2006) and Embick (2010) use subcategorization frames which make reference to phonological and lexical information.However, allomorph selection does not as a rule result in phonologically optimized structures.In Kaititj the ergative suffix appears as [-ŋ] after disyllabic stems and as [-l] after trisyllabic stems.Although the formula that captures the generalization makes reference to phonological vocabulary, in this case the syllable, this instance of allomorph selection does not in any way improve phonological well-formedness (Paster 2006).The existence of such patterns shows two things: allomorph selection need not be phonologically optimizing and some alternants, including different shapes of a stem, have to be listed (i.e.stored).

MORPHOPHONOLOGICAL SCHEMAS AND COPHONOLOGIES
In order to shed light on the differences in the assumptions of traditional generative and usage-base approaches, it is vital to compare the relevant aspects of rules (Current generative analyses generally employ constraints instead of rules, but certain properties of rules remain implicit.)and schemas (usage-based approaches).Schemas in contradistinction to rules emerge from the lexicon, that is, from the stored representations of words and phrases.As a consequence, they have "no existence independent of the lexical units from which they emerge" (Bybee 2001: 27).Rules, on the other hand, exist independently of the stored items and form part of a module that is separate from the lexicon.The productivity of a schema is a function of the number of participant items (gang effects).Put differently, the more words comply with a given schema, the more productive the schema is predicted to be. 15n this view, productivity of a schema is gradient and probabilistic, which is a consequence of the close connection between schemas and stored words.While in early generative models rules did not show a direct relationship with the number of words they apply to, more recently, modeling non-categorical effects, including the effects of frequency, has been facilitated by the use of gradient and probabilistic OT constraints (e.g.MaxEnt, Hayes & Wilson 2008).
Since the publication of The Sound Pattern of English (Chomsky & Halle 1968), the wellformedness of rules has typically been associated with the notion of markedness or phonological naturalness.Chomsky & Halle (1968) observe that their model largely overpredicts the types of processes that occur in natural languages and propose to constrain the set of possible rules by appealing to phonological markedness (Chapter 9).Rules that lead to the reduction in markedness are preferred over those that do not.More recently, Hayes and Steriade (2004: 1) claimed that markedness constraints are gleaned from phonetic knowledge, the latter being somewhat vaguely defined as "the speakers' partial understanding of the physical conditions under which speech is produced and perceived".In this view, final devoicing of obstruents is possible, while final voicing is predicted to be impossible, as it would lead to more marked structures (Kiparsky 2006).The position that synchronic universals (i.e.markedness) constrain diachronic change and shape linguistic patterns is advocated in, for instance, de Lacy (2002;2006), Kiparsky (2006;2008) and de Lacy & Kingston (2013).
On the other hand, there is accumulating evidence that undermines the role of markedness as an active bias in synchronic grammars.Processes that result in more marked structures are pervasive (Bach & Harms 1972;Anderson 1981;Blevins 2004;Hale & Reiss 2008;Czaplicki 2013a;2014a;2019).Examples of such processes include final voicing in Lezgian (Blevins 2006) and unnatural patterns of consonant epenthesis in various languages (Blevins 2008).Typological asymmetries can be explained by extragrammatical factors (e.g. common trajectories of phonetically based sound change) (Ohala 1983;Blevins 2004).Insofar as schemas are based on stored representations, they are language specific and not necessarily dependent on naturalness, understood as a universal learning bias.Thus, schemas are a priori markednessfree.However, it should be noted that markedness constraints are not logically incompatible with schemas.In principle, the emergence of markedness constraints is independent of schemas.Yet, one of the predictions of usage-based models is that language-specific considerations should override markedness constraints when a conflict between the two pressures arises (a preference for morphological conditioning over phonological conditioning, see below).The role of markedness in schema-based approaches certainly deserves more attention.Rules refer to distinctive features (Chomsky & Halle 1968).If segments are referred to, this is done as a shorthand for featural specification that underlies a particular segment.Schemas are not similarly restricted in their vocabulary.
(12) Lexical organization provides generalizations and segmentation at various degrees of abstraction and generality.Units such as morpheme, segment, or syllable are emergent in the sense that they arise from the relations of identity and similarity that organize representations.Since storage in this model is highly redundant, schemas may describe the same pattern at different degrees of generality (Langacker 2000;Bybee 2001: 7-8).
As mentioned in ( 12), schemas can refer to various organizational units, such as segment, syllable and feature, as long as these units emerge from stored representations.The claim cited in (12) points to yet another important difference between rules and schemas.
Rules are preferably stated using phonological vocabulary (this is a requirement of modularity, Scheer 2012).In contrast, schemas require that different types of information -phonological, syntactic, morphological and semantic -be simultaneously accessible.This is the fundamental property of Parallel Architecture, a theory of grammar developed by Ray Jackendoff (cf.Jackendoff 2002).Schemas are formed on the basis of phonological and semantic similarity between stored words.Morphological structure emerges from these identity relations.On this view, -ist-a and -yst-a are predicted to be identified as alternants of a single suffix because of their phonological and semantic similarity as well as their near complementary distribution.
Schemas contain information about morphological structure (e.g.English past tense formation using -ed) (Bybee 2001: 23-24).In fact, Bybee (2001: 97-100) argues that segmental alternations display a preference for morphological conditioning over phonological conditioning.She also claims that "once morphological conditioning becomes dominant, it follows that phonological principles, such as patterning based on natural classes, will no longer be applied in the same way as for phonetically conditioned processes" (2001: 105).Put differently, markedness considerations are less important for morphologized patterns than for patterns that are fully phonetically transparent.Two phonologically similar contexts can give rise to different segmental alternations, as long as the morphological conditioning is different (i.e. two different morphemes).For example, Velar Mutation applies before the suffix -ek but not before -ist-a (see Sections 2.1-2.2),even though the context of a following front vowel is present in both cases.
Schemas are compatible with approaches that assume the existence of multiple cophonologies within one grammar (Booij 2010;Inkelas 2014;Booij & Audring 2017;Czaplicki 2020), where each morphological construction is associated with its own phonological subgrammar.In other words, each morphological construction has its own phonological properties.Representative of early approaches that assume the existence of multiple cophonologies within one grammar is Itô & Mester's (1995) Core-Periphery Model of the lexicon, which identifies the core area of the lexicon, governed by a maximum set of markedness constraints (markedness dominates faithfulness).Their markedness constraints are syllable-and segment-related and penalize, for example, voiced obstruent geminates and non-geminate [p].Structures that occupy less and less central areas of the lexicon show increasingly more violations of markedness constraints (faithfulness dominates markedness).By extension, the Core-Periphery Model predicts that the less nativized (more peripheral) a structure, the more faithful it should be to its input.Crucially, the model predicts that there are multiple layers within the lexicon and each layer is differentiated from the others in that it has its own specific constraint ranking (phonological grammar).In Japanese four layers are distinguished: Yamato, Sino-Japanese, Assimilated Foreign and Unassimilated Foreign.An important claim of Itô & Mester (1995) is that the differences between the core area of the grammar and more peripheral areas relate to the reranking of faithfulness constraints.The ranking of markedness constraints remains constant for the whole grammar.For example, the hypothetical input /paka/ is realized differently depending on the layer of the grammar in which it is generated.If it is processed in the Sino-Japanese stratum it surfaces as [haka], due to the ranking of No-P above Faith.
In the Assimilated Foreign stratum, the output is [paka], due to the reranking of Faith above Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 No-P. 16 Morphological conditioning can also be handled by indexing constraints to specific morphological constructions (Itô & Mester 1999).
To sum up, schemas are dependent on stored representations, their productivity (strength) depends on the number of words they derive (type frequency), they are a priori markedness-free, they can be stated at various degrees of generality (e.g.segment, feature, syllable) and they include morphological information.Supporting evidence for these properties of schemas can be found in, for example, Bybee (2001), Ellis (2002) (frequency effects), Booij & Audring (2017), Czaplicki (2013a;2019;2020) (morphological conditioning) and Blevins (2004) (markednessfree generalizations).
Schemas can be product-or source-oriented.An example of a product-oriented schema specifying English Past Tense in walked, begged and wanted was given in (1) and is repeated in (13) (Bybee 2001: 126).Source-oriented schemas mention input (base) as well as output.
Several source-oriented schemas representing Polish consonant mutations are exemplified in ( 14).
( The schemas in ( 14) express the formation of diminutives, (a), and instrumentals, (b), from nouns whose stems end in various consonants.The former pattern was illustrated in Section 2.2, the latter can be applied to kro[k] 'step', pro[g]-u 'threshold' gen sg and du[x] 'ghost'.Sourceoriented schemas are preferable for describing consonant mutations, as the applicability and type of mutation in the derivative crucially depend on the final consonant in the base.In (13a) base-final velars appear as their mutated alternants before -ek.However, at the same time base-final [t] fails to mutate and the cluster [ɕtɕ] depalatalizes to [st] in the same context.So we could not refer to a product-oriented schema requiring that a consonant appear in its palatalized (or mutated, or retroflex) form before -ek.Rather, the output of the concatenation of a suffix depends on a particular base-final consonant.Before the instrumental suffix -em, illustrated in (14b), velar plosives mutate but the fricative remains unchanged.Such arbitrary suffix-and consonant-specific alternations, which abound in Polish, would be difficult to conceptualize as product-oriented schemas (see Becker & Gouskova 2016 for more evidence that source-oriented schemas are necessary).
Schemas defined along these lines constitute the core of the proposed analysis.Type frequency determines the strength of linguistic patterns, in the sense that the more frequent the pattern, the more likely it is to be extended to novel words.In the formalization of the analysis, type frequency finds reflection in the ranking of constraints representing morphophonological schemas.Constraints representing more frequent schemas are ranked higher than constraints corresponding to less frequent schemas.In the case at hand, schemas representing -ek will be ranked higher than schemas pertaining to -ist-a, as illustrated in (15).Schema-constraints are interspersed with phonological constraints, as will be demonstrated in Section 3.2.3.

REPRESENTING CONSONANT MUTATIONS IN POLISH
In traditional terms, palatalization is a type of consonant mutation that is caused by a following front vowel or a palatal glide.Palatalization has its diachronic roots in the phonologization of coarticulatory effects between a consonant and the following front vowel or a palatal glide (Jakobson 1929(Jakobson /1962;;Bateman 2007;Kochetov 2011).In phonological analyses, palatalization has been represented as agreement in certain features between a consonant and a following vowel.Several feature theories have been in use, for example, the Halle-Sagey model (Sagey 1986) Padgett (2002) and Halle (2005) in the sense that feature sets (classes of segments) are defined on a language-and alternation-particular basis.
Coronal Mutation, e.g.The assumption that the output of Velar Mutation, postalveolars, is [+back] gets support from their phonetics and distribution.Hamann (2002) has found that Polish postalveolars meet the criteria of retroflex consonants and retroflexion is incompatible with palatalization.In addition, postalveolars never appear before the high front vowel /i/ (except in recent borrowings).Based on these facts, Velar Mutation is an instance of coronalization.
The analysis at hand focuses on one important aspect of these mutations: Velar Mutation results in a change of the major place of articulation (from dorsal to coronal), while Coronal Mutation generates no similar change (the coronal place remains).This difference will be central.
A family of output-output faithfulness (paradigm uniformity, PU) constraints will be relevant for the present purposes (Kenstowicz 1996;Benua 1997;Steriade 2000).Such constraints have an important function of improving the transparency of morphological relationships between words and, thus, may facilitate lexical access.Dressler (2003: 464) refers to this pressure as "morphotactic transparency" and adds that "the most natural forms are those where there is no opacifying obstruction to ease of perception".Research on morphological processing suggests that transparent phonology aids morphological decomposition (Frauenfelder & Schreuder 1992: 173).Output-output faithfulness is violated whenever a consonant undergoes a mutation.Both Velar Mutation and Coronal Mutation incur a violation of output-output faithfulness.
identPl, an output-output faithfulness constraint given in ( 16), is used to represent the difference between the effects of Velar Mutation and Coronal Mutation.identPl is violated when Velar Mutation applies (-coronal, +dorsal > +coronal, -dorsal).When Coronal Mutation occurs, the constraint is respected (+coronal, -dorsal > +coronal, -dorsal).This analysis relies on output-output faithfulness constraints, as opposed to input-output faithfulness constraints.Identity relations between a derivative and its base are evaluated.Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255

HISTORICAL CHANGE: WHY CORONALS UNDERWENT MUTATIONS BUT VELARS DID NOT
This section aims to account for the emergence of a general pattern, in which coronals show mutations before -ist-a, while velars do not.Thus, the proposed analysis is historical and refers to the time when the suffix -ist-a became productive in the language.At that time, there were no morphophonological schemas related to the suffix -ist-a that would be sufficiently entrenched in the lexicon to determine the output.As mentioned in Section 2.1, there is historical evidence that Coronal Mutation initially applied across the board and was phonetically conditioned.Therefore, the evaluation of candidates proceeds according to phonological constraints.The modern-day complexities surrounding coronals are addressed in Section 3.2, where it will be argued that currently morphophonological schemas play a greater role than at the inception of the -ist-a pattern.
In explaining why coronals underwent mutations before -ist-a and velars did not, I make use of the observation made in Section 2.6 that Coronal Mutation does not change the major place of articulation, while Velar Mutation does.Thus, identPl is violated by Velar Mutation but not by Coronal Mutation.
In addition to identPl, a faithfulness constraint that will be relevant is maxV illustrates the already-mentioned assumption that feature sets are established on the basis of the phonological behavior of segments in a particular language and possibly in a particular alternation, rather than on the basis of a universal hierarchy of features.In the proposed analysis the feature [±back] can co-occur with both [±dorsal] and [±coronal], which in essence is reminiscent of the claims of Padgett (2002) and Halle (2005).
In ( 18), an evaluation of a word in -ist-a with a base-final velar is shown.The high-ranked identPl mandates that a candidate with no change of major place be selected (candidate a).Candidates (b) and (c) are eliminated due to a change from dorsal to coronal.

PRESENT-DAY DEVELOPMENTS: WHY (SOME) MUTATIONS ARE BEING ELIMINATED
In this section, we address the issue of the gradual elimination of consonant alternations from -ist-a words with base-final coronals [t d r].It is worth noting that alternations with base-final coronals [s z n] show a weaker tendency towards elimination.I refer to an interplay of the type frequency of a pattern with the requirement that the base be transparent in the derivative (output-output faithfulness).In contrast to the analysis in Section 3.1, where phonological constraints determined the output, I assume that, the analysis in this section must refer to morphophonological patterns because at the present stage some patterns are more entrenched than others and this must be reflected in the grammar.This assumption is based on the finding that the frequency of a pattern in the lexicon (its type frequency) determines its productivity.As words showing a particular pattern accumulate, they form a gang.When the gang reaches a certain threshold, a schema emerges and the size of the gang determines the productivity of the schema (gang effects; see Section 2.3).In Section 3.2.1-3we look at data drawn from a dictionary and a corpus, and Section 3.2.4examines experimental data probing native speaker intuition.Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255

Dictionary entries
Table 3 provides the counts of dictionary entries with the variants -ist-a and -yst-a for each base-final consonant drawn from the reverse dictionary of Polish (Bańko et al. 2003) It is argued that PU pressures are responsible for the elimination of morphophonological alternations between the base and the derivative.The schemas in Table 4 represent some of the attested morphophonological patterns based on dictionary entries.They are divided into two categories according to the behavior of the base-final consonant: the schemas in the first column show alternations (mutations), unlike the schemas in the second column.
For the base-final consonants [t d r] in (d-f), both the alternating and the non-alternating schemas are present.In fact, in accordance with the quantitative data in Table 3, the nonalternating patterns seem to be more frequent in the lexicon than the alternating patterns.
Given that the frequency of a pattern determines its strength, the strength of the nonalternating schemas for [t d r] is higher than the strength of the corresponding alternating schemas in modern usage and is reflected in the relative ranking of the schema-constraints in (21) (">>" indicates dominance).

Alternating patterns Non-alternating patterns
As for the base-final coronals [t d r] in (22a), mutations can be avoided thanks to the patterns on the right, which have emerged recently and are affecting more and more words (evidence for the latter claim is given below).Looking at (22b) it is clear that for the coronals [s z n] the older alternating patterns have not been replaced.
How do we account for the asymmetry in the treatment of the two groups of coronals in ( 22)? Crucial to the analysis is the fact that, while all of the patterns in ( 22

ALTERNATING SCHEMAS NON-ALTERNATING SCHEMAS
a.

Listedness
This analysis makes crucial reference to the availability of a lexical representation of a particular word.During lexical retrieval, the lexical representations of some morphologically complex words are stored and available.The mental representations of other words are not available and the words need to be processed on-line from their component parts, e.g.base + affix.I refer to a dual-route model of lexical access (McQueen and Cutler 1998;Hay 2003;Plag 2012), which proposes that the availability of a mental representation of a word depends on the word's token frequency.A complex word may be accessed via the whole-word route or the decomposed route.The choice between the two ways of access is determined by the relative frequency of the derivative and the base, as well as the phonotactic constraints of the language.If the derivative is more frequent than the base, the derivative is more likely to be accessed via the whole-word route.If, on the other hand, the base is more frequent than the derivative, we expect the derivative to be accessed via the decomposed route.The latter type of relationship occurs more commonly in general and has been identified in all the cases of -ist-a words discussed here.I extend the model and propose that some words in -ist-a are accessed via the whole-word route and others via the decomposed route.The choice depends on their absolute frequency.In other words, frequency impacts the morphological decomposability of the derivative.The second factor that has been found to determine the decomposability of words is phonological: the phonotactic probability of segmental sequences (Plag 2012).Plag's (2012) phonotactic probability in some crucial ways parallels the phonological restrictions (formulated in terms of features) on consonant-vowel sequences, e.g.*[ʂi ʐi tʂi dʐi], enforced by agree constraints in Section 3.1.
The constraint Uselisted promotes the selection of a stored lexical representation as the input.
If such a form is unavailable (for instance, as for rare and novel words), the constraint is moot, as all the potential outputs violate it (Zuraw 2000).For example, the word [altɕ-ist-a] can be accessed via the listed form /altɕ-ist-a/ or via its component morphemes /alt/ + /ist-a/.
Uselisted promotes the former type of access, whenever available. (25)

Uselisted
The input portion of a candidate must be a single lexical entry.
The strength of a word's lexical entry has been found to depend on its frequency of use (token frequency) (Hay 2003).Therefore, it is necessary to gauge the strength of the lexical representations of words in -ist-a.In (26) I give the frequencies of the relevant bases with stem-final [t d r] and their derivatives in -ist-a.The words were compiled on the basis of the data extracted from the reverse dictionary (Bańko et al. 2003)  non-alternating patterns are exemplified in (c).Figure 2 shows boxplots for frequency of all the words in -ist-a with and without mutations identified in the corpus.All the derivatives extracted from the corpus show a lower frequency than their corresponding bases.An interesting tendency is discernible in the boxplots in Figure 2. The derivatives with mutations, such as those in (26a), on average exhibit a higher frequency than the derivatives without mutations, exemplified in (26b, c) (with the notable exception of stypen[d]-yst-a, marked as an outlier).This agrees with the predictions of usage-based models.Well-established words are expected to be more stable, hence more resistant to PU pressures, than are rare or novel words, because of their strong mental representations.Rare and novel words, on the other hand, are more susceptible to the influence of PU pressures because their representations are weak or unavailable.Thus, we expect a difference in the frequency of words with and without mutations.
In analyzing the frequency of all the words with base-final [t d r] available in the corpus, two things deserve a mention.First, the overall number of the words is not very high: 47 with mutations and 24 without mutations.Second, among the words with mutations the degree of variance of frequency is very high.For example, there are 11 words in this group with the frequency below 10.At the same time, 6 words in this group exceed the frequency of 1000 (two of them exceed 3000).The mean frequency of all the derivatives showing stable alternating patterns, (26a), is M = 391.17,SE = 123.94and the median is Mdn = 92.In the group of vacillating and non-alternating words, (26b) and (c), the degree of variance is not that high.
Only one word, stypen[d]-yst-a (visible as an outlier in Figure 2), shows a frequency higher than 100; the frequency of most of the remaining words is well below 100.Crucially, the frequencies of the words that vacillate, all of them shown in (26b), are low.In fact, propagan[dʑ]-ist-a/ propagan[d]-yst-a is the only one among them whose frequency exceeds 100.Note, however, that the two variants of this word show very similar frequencies, which might explain their persistence.In the remaining cases in (b), one of the variants is significantly more frequent than the other (even though the overall frequency does not exceed 10).The mean frequency of the vacillating and non-alternating derivatives (excluding stypen[d]-yst-a) is M = 38.13,SE = 6.21 and the median is Mdn = 33.A non-parametric Mann-Whitney test was run on the data.On average, the words with mutations are more frequent than the words without mutations or vacillating words, U = 356.00,z = -2.308,p < .05. 19Finally, how can we explain the unusual behavior of stypen[d]-yst-a?Being of recent origin, the word was formed when the non-alternating pattern was more frequent than the corresponding alternating pattern.It began to be used very frequently and its representation stabilized (without mutations).Words like stypen [d]ysta show that high-frequency words can represent two patterns in modern usage: an alternating one when the word was formed earlier and a non-alternating one when it is of a more recent origin.
Some of the important predictions of usage-based models are upheld.Words of the highest frequency (i.e.above 1000 in the analyzed data) are phonologically stable.We would not expect a word like ren[tɕ]-ist-a, whose frequency is the highest among the analyzed words (3780), to appear in current usage in its corresponding form without a mutation, i.e. as *ren[t]-yst-a.Its strong memory trace prevents such an outcome and Uselisted promotes the selection of its stored representation as input.In contrast, words showing a relatively low value of frequency are susceptible to PU pressures, for example, bonapar[t]-yst-a with the frequency of 29.In addition, low-frequency words show more variation than high-frequency words, e.g.al[tɕ]-ist-a/al[t]yst-a (4/1).A less expected discovery is a substantial number of words with mutations whose frequency is low -11 of them have a frequency below 10, for example, kontraban[dʑ]-ist-a (7).Their persistence may be due to the gradual and probabilistic nature of change (lexical diffusion).What is more, some of the words with mutations (e.g.kontraban[dʑ]-ist-a 'contrabandist') are relatively old and predictably comply with the phonological requirements which were regular at the time of their formation.Their low frequency today may well reflect changes in the society.
In low-frequency vacillating words such as al[tɕ]-ist-a ~ al[t]-yst-a the variant with a mutation is most likely to be older than the variant without a mutation.Another reason for the persistence of low-frequency words with mutations might have to do with the fact that the corpus represents mostly written language, which means that the data reflect conservative usage and may be an imperfect representation of spoken language.In order to address this issue, in Section 3.2.4,we will examine data elicited in an experiment involving native speakers of Polish.

Analysis
We begin with an analysis of words with base-final [s z n], which show mutations.identPl is not shown, as it is moot for coronals.In the evaluation of bas-ist-a [baɕ-ist-a] 'bass player' Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 (a recent word), the listed form is unavailable, as indicated below the Input in ( 27).The mutation in the winning candidate causes a violation of the low-ranked ident [±anterior] , the latter requires faithfulness to the feature [±anterior].The evaluation highlights two important issues.First, novel words do not have strong lexical representations, therefore, Uselisted is violated by all the candidates, and the relevant schema-constraint determines the output.Second, once morphophonological schemas become entrenched (i.e.patterns are morphologized), markedness constraints regulating mutations (e.g.agreecor [±back] ) take a back seat.This is due to the preference for the morphological over phonological conditioning of patterns, discussed in Section 2.5.The entrenchment of a pattern is a function of the size of the gang it represents (gang effects, see Section 2.3).In OT formalism, the morphologization of a pattern may be viewed as the emergence and promotion of the relevant schema-constraint in response to the increasing number of words that observe the pattern.In the evaluation of the derivative from Bonapar[t]e the listed form is unavailable due to a low token frequency and, therefore, Uselisted (not shown) is violated by all the candidates.ident [±strid] and ident [±anterior] represent PU pressures.
(28) Evaluation of a derivative in -ist-a of [bɔnapartɛ] Currently, the non-alternating schema for base-final […t] has a higher type frequency, hence higher strength, than the alternating schema.This finds reflection in the ranking of the two schemas in tableau (28).Candidate (b) is selected because it satisfies both ident [±strid] and the dominant schema.Candidate (a) uses neither of the schemas available in the lexicon for the formation of words in -ist-a and hence violates the respective schema-constraints.Candidate (c) is eliminated due to a violation of ident [±strid] .
To derive an earlier state of affairs when the alternating pattern prevailed in the lexicon, the ranking of the two schema-constraints must be reversed in (28).This would account for the expansion of non-alternating patterns at the stage when the alternating patterns were in fact more common.With ident [±strid] ranked high, the candidate with a transparent base, (b), is selected regardless of the ranking of the two schema-constraints with respect to each other.
In the case of fle[t] ~ fle[tɕ]-ist-a, the derivative fle[tɕ]-ist-a has a relatively high token frequency, which entails that the listed form is available.The morphologically complex Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 word is independently stored and hence is accessed via the whole-word route.The impact of PU pressures as well as schemas is outweighed by the impact of the high-ranked Uselisted.Together with the attested variation, this is interpreted as evidence that the word may be alternately accessed via the whole-word route when the listed form is available, or the decomposed route when the listed form is unavailable.The availability of the listed form is determined probabilistically on the basis of token frequency. 20When the word is accessed via the whole-word route, faithfulness constraints are outranked, and Uselisted effectively selects the output.The tableaux in (30) illustrate the variability of the output.When a stored derivative is available, as shown in the first tableau, candidate (c) is selected (with a mutation), as it is the only one that satisfies Uselisted.On the other hand, when the stored representation is unavailable and Uselisted is violated by all the candidates (the second tableau), the word is accessed via the decomposed route and derived on-line from its component parts.In such a situation, candidate (a) fails the evaluation because it does not respect either of the available schemas.Candidate (c) loses to candidate (b) because it contains a modification of stridency specification.To summarize, the variation between al[t]-yst-a and al[tɕ]-ist-a can be explained in a dual-route model of lexical access by assuming that the former variant is derived on-line from its component parts, while the latter is retrieved from memory via the whole-word route.The outcome depends on the probabilistic availability of the memorized representation.Such vacillating forms suggest that the three pressures, i.e.PU, type frequency, and token frequency, are important in the language and that the actual usage reflects their combined effects.The differences between the impact of type and token frequency will be elaborated in Section 5.  A conditional inference tree analysis using the party package in R (Hothorn et al. 2006) has been run on the data, with outcome (mutation/no mutation) as the dependent variable and consonant as the predictor (18 levels).The category "other" has been omitted, as it is not particularly revealing.A conditional inference tree provides estimates of the likelihood of the value of the response variable (mutation/no mutation) on the basis of a series of binary questions about the values of predictor variable (consonant).This method has been chosen, as it is designed for binary dependent variables and deals well with unbalanced data and small or even zero cell counts (Tagliamonte & Baayen 2012).
The first partitioning in the decision tree in Figure 3 shows a highly significant difference between the likelihood of mutations for [s z] and [n] in the experimental results.Why is the tendency to eliminate mutations stronger for both [s] and [z] than for [n]: 41.6% and 57.9 vs. 5.8%?The answer might have to do with cue robustness and perceptibility.Wright (2004: 43-44) reviews the activity of auditory nerve 21 Pairwise comparisons for [t d r s z n] using a generalized mixed-effects logistic regression (glmer, binomial) with speaker and word as random intercepts and consonant as a fixed effect were also run.The regression could not be used for all the consonants, as the method does not handle zero counts.The results in essence confirm the results of the decision tree analysis.fibers and suggests that consonants receive an auditory boost in CV sequences.Crucially, the boost is particularly true of stops, fricatives and affricates but less so of nasals.Thus, the auditory difference between [n] and [ɲ] is smaller than the distance between [s] and [ɕ] (and [z] and [ʑ]).It follows that a higher rate of mutations for [n] than for [s z] can be explained by appealing to cue perceptibility.Steriade's (2008) approach using P-maps and contrast-based constraints offers a formal solution.PU constraints require that the relevant parts of derivatives be maximally similar to their bases in order to aid word recognition.Acceptability of mutations of the base in the derivative is in fact gradient, with mutations that diverge the most from their base correspondents being less acceptable than mutations of consonants that render them more similar to their correspondents.In other words, mutations that introduce less perceptual contrast between the derivative and the base are preferred.An analysis employing P-maps appeals to the ranking of the contrast-based constraints ∆ (s~ɕ) >> ∆ (n~ɲ) and their interaction with phonotactic constraints and morphophonological schemas.This ranking derives from the fact that the alternation (s~ɕ) universally involves more perceptual contrast than (n~ɲ) and, therefore, is more likely to be avoided, all else being equal.The rates in Table 5 can be directly compared with the results of the investigation of dictionary entries in Table 3. Table 6 shows the rates of words without mutations in the two relevant groupings of consonants, that is, [t d r] and [s z n], and juxtaposes the results obtained in the experiment with those taken from the dictionary.A cursory look at  ) and source (experiment/dictionary) as predictor variables.Table 6 and the decision tree in Figure 4 show two things: first, the experimental results are not as categorical as the dictionary data and, second, the alternating patterns are significantly less commonly used by the participants of the experiment than in the dictionary (p < .001for [t d r s z] and p = .021for [n]). 22  Admittedly, the comparison across the two conditions (dictionary/experiment) should be approached with caution because of the different methods of data collection.Dictionaries   offer prescriptive data, often ignoring variation found in actual language use.Data extracted from a corpus provide a better match for experimental data because the method of collecting data for a corpus is more compatible with the method of obtaining data from respondents in experimental conditions. 23Table 7 compares the number of mutations of base-final [t d r] in two conditions: experiment vs. corpus (The National Corpus of Polish).A logistic regression run on the data suggests that the condition variable (experiment/corpus) has a significant influence on the outcome variable (mutation/no mutation).The coefficient on the condition variable has a Wald statistic equal to 83.581, which is significant at the .001level (df = 1).The overall model is significant at the .001level according to the Model chi-square statistic ( 2 (1) = 92.607).
There are two possible reasons for the different rates of paradigm uniformity effects in the experiment vs. in the corpus.First, while novel and rare words in the corpus correspond to nonce words in the experiment, well-established words in the corpus do not have analogues among the experimental stimuli.Given that novel words are more susceptible to paradigm uniformity pressures than established words, the rates of mutations are expected to be lower for the experimental results.Second, it is likely that the data indicate an on-going change that might lead to the gradual elimination of mutations from the -ist-a pattern.The fact that there is a difference between the dictionary and corpus data, on the one hand, and the experimental data, on the other, might suggest that PU pressures continuously affect mental grammars and are responsible for a gradual change in morphophonological patterns.In particular, the promotion of ident [±anterior] above schema-constraints is held accountable for the reduction in the number of mutations for base-final [s z n] in the experimental results, as compared with the dictionary and corpus data.

A HIGH-FREQUENCY PATTERN: DIMINUTIVES IN -ek
The purpose of this section is to demonstrate that morphophonological patterns exhibiting a high type frequency in the lexicon are stable and that the schema-constraints encoding them are ranked higher than schema-constraints representing patterns with a lower type frequency.With relevance to the analysis at hand, the schema-constraints representing the -ek suffix are ranked higher than the schema-constraints representing the -ist-a suffix.This reflects their relative robustness in the grammar.The schema-constraints for the suffix -ek are ranked above the faithfulness constraints preventing consonant mutations, i.e.PU ident constraints.In this way, wellestablished morphophonological patterns are respected at the expense of PU considerations.In order to evade the potential impact of strong lexical representations and ensure that the words are accessed via the decomposed route, let us consider some relatively new diminutives ending in -ek. 24 A search of the plTenTen corpus for diminutives in -ek without mutations of velars (e.g.*[drink-ɛk] or *[buldɔg-ɛk]) has given zero results.It appears that words in -ek show stable mutations of velars in spite of PU violations.Given the stability of this pattern confirmed by the lack of words representing this pattern without mutations in the corpus, a nonce-word experiment probing speaker intuition is unlikely to be insightful.The high type frequency of this pattern overrides phonological constraints (PU pressures).In tableau (34), the relevant schema-constraint is ranked above the constraints enforcing base transparency.
23 There is accumulating evidence suggesting that comparisons across different conditions provide valuable tools for assessing the psychological reality of corpus-based models.For example, Djivak et al. ( 2016) explicitly compare the performance of a statistical model based on data derived from a corpus with the performance of native speakers in selecting one of six Russian verbs meaning 'try'.
24 Two diminutives for [drink] are attested: one with e-insertion in the base, [drinɛtʂ-ɛk], and one without, [drintʂ-ɛk].The conditioning of e-insertion in the base in diminutives is beyond the scope of this paper, but see Czaplicki (2020).The token frequencies of the lemmas drinecz-ek and drincz-ek in the plTenTen corpus are low: 594 and 33, respectively.The token frequency of the lemma buldoż-ek is higher: 2,351.

DISCUSSION
This section aims to tease apart and compare the formal mechanisms of dealing with the effects of type and token frequency.As regards type frequency, constraints that represent morphophonological schemas are ranked according to the frequency of the patterns they encode.In this way, the impact of stronger patterns, i.e. those with a higher type frequency, is greater than the impact of weaker patterns.Ranking schemas allows us to model the competition between morphophonological patterns and PU constraints.To be more specific, more robust patterns are predicted to override PU considerations, which might result in the preservation of mutations and is manifest in the formation of words in -ek.The formation of words in -ist-a, on the other hand, shows that weaker morphophonological patterns are dominated by PU pressures, a ranking which opens the way for patterns respecting base transparency (i.e.patterns without mutations).The different impact of paradigm uniformity pressures on patterns (schemas) of high and low frequency can be represented as in (35).The stronger the pattern (schema), the more resistant it is to PU pressures.
(35) Schema high frequency >> PU >> Schema low frequency In addition, individual words of high frequency are predicted to resist PU pressures, whether they represent a high-frequency or a low-frequency schema.This is due to the availability of the listed representations of high-frequency words and the bias for the whole-word route of lexical access.In the proposed analysis the impact of token frequency was mediated by the constraint Uselisted, which promotes the use of listed forms (when available).The effects of token frequency were detectable on agent nouns in -ist-a, a low-frequency pattern.While mutations are stable for high-frequency words representing the pattern -ist-a, they tend to be eliminated from low-frequency words representing this pattern.
(36) Uselisted >> PU The effects of token frequency (whether high or low) have not been identified for -ek, a highfrequency pattern (schema). 26The ranking of the constraints established in the course of this analysis is given in (37).
25 It is in principle possible that within a construction such as diminutives in -ek, one of the schemas, for instance, [x] ~ [ʂ-ɛk], might have a significantly lower type frequency than the other schemas, and would, consequently, behave differently with respect to PU pressures.Czaplicki (2014b) discusses diminutives in -ik/-yk in Polish and demonstrates that a schema of a low type frequency (pertaining to [ʂ]) within this construction has been altered, whereas comparable stronger schemas (e.g.pertaining to [ʐ]) remain stable.However, schema modification of this type is not likely in the case of such high-frequency constructions as diminutives in -ek.For example, in the plTenTen corpus, the type (and total token) frequency values of words in -ek with base-final velars  (39,182).The frequency values of all the schemas for -ek are much higher than those for -ist-a, which makes modification of the schemas for -ek unlikely.
26 The working assumption is that the type frequency of the entire construction (the number of all the words that represent it) determines its strength.Pattern strength can also be measured by the sum of token frequencies of all the words that show the pattern.The hypothesis that the sum of token frequencies of all the words that represent a pattern is a more reliable determinant of the pattern's strength than the type frequency of the pattern certainly deserves verification.Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 (37) Schema high frequency , Uselisted >> PU >> Schema low frequency More generally, the possibility of assigning different rankings to schemas means that each construction has its own subgrammar (cophonology), where schema-constraints are interspersed with phonological constraints.
The proposed account diverges from previous accounts of Polish consonant mutations (e.g.Rubach 1984;Szpyra 1989;Ćavar 2004 andGussmann 2007) in two crucial aspects.First, it claims that morphological patterns (constructions) differ in terms of the different degrees of stability of consonant mutations, where the conditioning factor is type frequency.Token frequency is held accountable for the stability of alternations in individual words.Neither type nor token frequency was explicitly used in previous accounts, which means that the different propensity of particular constructions and words for segmental alternations was unaccounted for.Second, previous accounts argued that phonological context is the principal factor determining the targets and triggers of palatalization. 27In contrast, the proposed account places emphasis on the morphological (i.e.construction-specific) conditioning of consonant mutations.

CONCLUSIONS
Two morphophonological patterns (constructions) showing consonant mutations in Polish have been analyzed: a low-frequency pattern -formation of agent nouns with -ist-a and a high-frequency pattern -diminutive formation with -ek.It has been shown that the tendency to avoid mutations has an effect on a construction with a low type frequency in the lexicon.The more frequent the construction, the more likely it is to remain unaffected by paradigm uniformity pressures.Put differently, less frequent patterns are more susceptible to modification due to phonological constraints than more frequent patterns.In order to account for this regularity, morphophonological schemas embodying these patterns must be represented in the grammar according to their type frequency.In Optimality Theory schema-constraints pertaining to the pattern -ek were ranked higher than those enforcing the pattern -ist-a.The impact of paradigm uniformity was represented as an interaction of output-output faithfulness constraints with morphophonological schemas.The discussion has offered evidence for construction-based cophonologies.In this approach, each morphological construction has its own phonological properties.It has been argued that schemas and the constructions they represent may show various degrees of susceptibility to PU pressures.The stability of a construction depends on its type frequency.The dynamic role of PU pressures in the grammar was confirmed by the results of an experiment which showed that mutations are being gradually eliminated from the -ist-a pattern.
In addition, the frequency of words (token frequency) has been shown to impact the drive for identity between the base and the derivative.The less frequent a word, the stronger the pressure to preserve its base and avoid mutations.The relationship between frequency and mutations is rooted in language processing.Words of higher frequency are accessed via the whole-word route, while low-frequency words are processed by accessing both the base and the affix.Thus, preservation of the base intact in the derivative is more important for less frequent words, as it speeds up their lexical access.Finally, the acceptability of mutations has been shown to depend on the degree of featural/perceptual similarity of mutated consonants to their base correspondents.A family of Output-Output faithfulness constraints targeting various features has been used to enforce base transparency.It has been demonstrated that the degree of featural similarity between derivative-base correspondents determines the acceptability of the derivative.
The discussion has shown that frequency is a key element of a predictive and explanatory phonological analysis.Both type and token frequency condition the stability of morphophonological patterns and are relevant in pattern maintenance and change.It follows that phonological and morphological theories must be designed in such a way as to allow frequency to play a major role.This is a departure from previous analyses of Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 segmental alternations in the generative tradition, where frequency was epiphenomenal and extraphonological.
We have looked at two constructions at the opposite ends of the type frequency spectrum.Future research should also focus on investigating patterns of intermediate frequency.If the proposed analysis is on the right track, the effects of phonological constraints (e.g.paradigm uniformity constraints) should be less categorical for such patterns, that is, there should be more variation.

APPENDIX
Nonce words used in the experiment

Figure 1 .
Figure1.The histograms in Figure1show that words in -ek outnumber words in -ist-a for all token frequency ranges.A non-parametric Mann-Whitney test run on the data reveals that the token frequency of words in -ek (Mdn = 459, M = 7433.76) is significantly lower than the token frequency of words in -ist-a (Mdn = 884, M = 6189.56),U = 390640.5,z = 3.49, p < .001.This result is likely due to the highly positively skewed distribution for words in -ek (skewness = 29.44).As can be seen from the histogram on the left in Figure1, low-frequency words in -ek outnumber comparable words of medium and high frequency.It follows that type frequency is a better measure of the strength of a pattern than token frequency.

Figure 1
Figure 1 Histogram of token frequency by suffix.
the stem of the base and the output have identical values for[±labial, ±coronal, ±dorsal].

Figure 2
Figure 2 Boxplots for frequency of words in -ist-a with and without mutations on a log scale with base 10.
As regards derivatives with base-final [t d r], the token frequency values in the corpus of the representative words al[tɕ]-ist-a, bonapar[t]-yst-a and fle[tɕ]-ist-a are 4, 29 and 96, respectively.The appearance of al[tɕ]-ist-a alongside al[t]-yst-a is an indication that the word is variably accessed via either the whole-word route or the decomposed route.bonapar[t]yst-a is mainly retrieved via the decomposed route.As the mental records of both words are weak (due to their low frequency), transparent bases facilitate their retrieval.Finally, fle[t] ~ fle[tɕ]ist-a does not show fluctuations because the derivative, having its own mental trace (due to a considerably higher frequency than al[tɕ]-ist-a and bonapar[t]-yst-a), is mainly accessed via the whole-word route.
Vacillating forms such as al[tɕ]-ist-a ~ al[t]-yst-a also indicate that the impact of lexical strength, established on the basis of token frequency, may reduce the influence of PU pressures.The token frequency of al[tɕ]-ist-a is low.

Figure 3
shows that mutations of [s z n] are overwhelmingly more common than mutations of the remaining consonants (p < .001).Similarly, mutations of [t d r] are more common than mutations of all the other consonants on the right-hand side of the tree (p < .001).In addition, [n] is significantly more susceptible Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 to mutations than both [s] and [z] (p < .001); the latter two are also different from each other (p = .016).The likelihood of mutations for [r] is higher than for [t d] (p = .045).In a similar way, [k g] are different from all the other consonants.Crucially, mutations of [s z n] are significantly more common than mutations of [t d r], and the latter are more common than mutations of the remaining consonants. 21These findings are compatible with the OT analysis provided in the previous sections.

Figure 4
Figure4offers a conditional inference tree with outcome (mutation/no mutation) as the dependent variable, and consonant ([t d r s z n]) and source (experiment/dictionary) as predictor variables.Table6and the decision tree in Figure4show two things: first, the experimental results are not as categorical as the dictionary data and, second, the alternating patterns are significantly less commonly used by the participants of the experiment than in the dictionary (p < .001for [t d r s z] and p = .021for [n]).22

Table 2
Frequency of words in -ist-a and -ek in plTenTen: Corpus of the Polish Web.Czaplicki Glossa: a journal of general linguistics DOI: 10.5334/g jgl.1255 and the Clements-Hume model (Clements & Hume 1995).In the Halle-Sagey model palatalization involves the agreement in the feature [-back] and in the Clements-Hume model the process is represented as agreement in the features [coronal, -anterior].More recent approaches cast in Optimality Theory concentrate on defining feature classes, i.e. features that function together in phonology (see, for instance, Padgett 2002 and Halle 2005).In the present analysis front vowels are represented as [+coronal, -back] and, consequently, palatalization imposes agreement in the features [+coronal, -back] on adjacent consonants and vowels.The assumption that the feature [±back] can refer to both [±coronal] and [±dorsal] is in line with both the consonant and the vowel are[+coronal].Candidate (c) fares well on agree[±back], as both the consonant and the vowel are[+back].It also vacuously satisfies agree[±coron], because, as mentioned above, [ɨ], being a central vowel, is not specified for place features.Candidate (d) incurs a fatal violation of maxV[±back], because the backness of the relevant vowel has been modified.This evaluation demonstrates that agree [±coron] must be ranked below identPl and maxV[±back].The evaluation in (19) focuses on words in -ist-a with base-final coronals.The faithful candidate shown in (a) fails to respect agreement in backness (a violation of agreecor[±back]) and loses to candidate (b), which satisfies agreecor[±back]by employing an alveolopalatal before a front vowel.identPlandagree[±coron], the latter not shown, are respected by all the candidates.Surely, the change of [t] to[tɕ]in the winning candidate cannot be of no consequence for faithfulness.The constraint that is violated here is the low-ranked ident[±anterior].The tableau also confirms the relevance of maxV[±back].Candidate (c) is not optimal because the quality of the vowel has been changed.
Some remarks are in order about the agree constraints.Candidate (a) violates agree[±coron]because the consonant does not agree in[±coronal]with the following vowel.Candidate (b) violates agreecor [±back] because a postalveolar [ʐ] is [+back], while the following vowel is [-back].agree[±coron] is respected because 17 Other analyses have argued for privative (unary) features.For example, Lombardi (1999) contends that all features are privative and Halle (2005) assumes that features designating articulators are privative, e.g.[coronal], while most other features are binary, e.g.[±back], [±anterior].Czaplicki (19) Evaluation of a derivative in -ist-a of [flɛt] 'flute' Input: flɛt + ist-a IdentPl AgreeCor [±back] MaxV [±back] Ident [±anterior] *It has been shown that mutations of velars were avoided, as they involve a change of major place, the latter being detrimental to base recognition.In contrast, mutations of coronals were tolerated, because they do not result in a comparable modification.The ranking responsible for the emergence of the pattern -ist-a is given in (20).
. Labials, velars and the coronals [s z n] show categorical behavior: labials and velars do not mutate, while the coronals [s z n] mutate.The coronals[t d r]show two competing patterns.Looking at the type frequency of the mutated and non-mutated variants for each derivative with basefinal [t d r], it appears that the lexical entries without mutations are actually more numerous.A corpus-based analysis of words with base-final [t d r] yields similar results.Based on a search of The National Corpus of Polish, words showing stable alternating patterns exhibit an average frequency of M = 391.17,while words without alternations and fluctuating words have an average frequency of M = 38.13.The difference is significant at p < .05.An in-depth analysis of the corpus-based data is postponed until the next section, where the concept of listedness is elucidated.

Table 4
, the base-final consonants [s z n] in (g-i) show only alternating patterns, while for the base-final consonants [p b f] in (a-c) and [ts dʐ ʐ k x] in (j-n) only non-alternating patterns can be identified.The patterns in the first column impinge on PU because they introduce segmental alternations in the derivatives.In other words, the transparency of the base is diminished.

Table 4
Selected alternating and non-alternating schemas.
of two features, the latter mutation crucially involves a change in major place.18Asaresult,mutations of [t d r] and [k g x] involve more contrast (i.e. they result in less similar segments) between the base and the derivative than mutations of[s z n].This observation can be used to explain the relative acceptability of the mutations for [s z n] and their avoidance for [t d r] (gradient) and [k g x] (categorical).
The attestation of al[tɕ]-ist-a alongside al[t]-yst-a may be explained by the different method of retrieval of the word.When it is accessed via the whole-word route, the stored representation is used, i.e. al[tɕ]-ist-a.When the word is accessed via the decomposed route, mutations in the base are detrimental to word recognition and PU becomes relevant, yielding al[t]-yst-a.In the case of bonapar[t]-yst-a, as the derivative with a mutation is not listed in the mental lexicon, a transparent base serves to facilitate word recognition.Finally, in the case of well-established words, such as fle[t] ~ fle[tɕ]-ist-a, the impact of PU constraints is mitigated by

Table 5
provides grounds for several interesting observations.Mutations of velars are marginal and may be the result of an indirect influence of Velar Mutation (discussed in Section 2.1) on this pattern.Mutations occur more frequently in the case of coronals (except retroflexes).While non-alternating patterns occur in more than 80% of the cases for [t d r], for [s z] the rates of non-alternating patterns do not exceed 60%.The coronal nasal is the most susceptible to mutations; the rate of non-alternating patterns for [n] does not reach 6%.

Table 5
Number of alternating and non-alternating patterns in -ist-a words for each basefinal consonant in the nonceword experiment.

Table 6
warrants a claim that the non-alternating patterns (no mutations) are more common in the experimental data than in the dictionary data for [t d r s z n].

Table 6
Rates of words without mutations in the experiment and the dictionary.

Table 7
Uselisted (not shown) is violated by both candidates.The faithful candidate in (a) fails the evaluation because it does not respect the dominant schema-constraint encoding the formation of words in -ek with base-final[k].Candidate (b) fares well on the schema-constraint and, in spite of violations of the PU identity constraints, comes out victorious.As a result, the high type frequency of the pattern -ek ensures that mutations in novel words are stable.25 = adjective, dim = diminutive, gen = genitive, ins = instrumental, loc = locative, masc = masculine, nom = nominative, Pl = plural, sg = singular adj