Category-specific effects in Welsh mutation

In this paper we investigate category-specific effects through the lens of Welsh mutation. Smith (2011) and Moreton et al. (2017) show that English distinguishes nouns and proper nouns in an experimental blending task. Here we show that Welsh distinguishes nouns, verbs, personal names, and place names in the mutation system. We demonstrate these effects experimentally in a translation task designed to elicit mutation intuitions and in several corpus studies. In addition, we show that these effects correlate with lexical frequency. Deeper statistical analysis and a review of the English data suggests that frequency is a more explanatory factor than part of speech in both languages. We therefore argue that these category-specific effects can be reduced to lexical frequency effects.


Introduction
In this paper, we use data from Welsh and English to demonstrate category-specific phonological effects and derive them from frequency effects. Smith (2011) reviews a number of category-specific phonological effects, showing how different parts of speech exhibit differing degrees of faithfulness to the input. In cruder terms, the phonology of a language can affect some parts of speech more than others. Among other effects, she shows that nouns generally exhibit greater faithfulness to the input than other parts of speech. Being more faithful means that nouns resist operations that would make them less like their input. It also means that they are more varied phonologically than other categories. 1 Moreton et al. (2017) expand on this result demonstrating emergent category effects in English that also distinguish proper names; specifically, proper names are more faithful to the input than other nouns. They do this experimentally, using a word-blending task. For example, subjects were asked about the acceptability of nonce blends involving items like soprano and preening as either sopreening or sopraning. Moreton et al. found that subjects were more inclined to accept sopraning over sopreening when soprano was interpreted as referring to the TV program The Sopranos than if it referred to a type of singer. Loosely, more of the word is preserved in blending if it is a proper noun than if it is a common noun. 1 There have been a number of other approaches to the formalization of category-specific effects and to how such systems might be learned, e.g. Itô & Mester (1999), Alderete (2001), Inkelas & Zoll (2007), Albright (2008), Itô & Mester (2009), Shih & Inkelas (2015), Becker & Gouskova (2016), etc.
Glossa general linguistics a journal of Hammond, Michael, et al. 2020. Category-specific effects in Welsh mutation. Glossa: a journal of general linguistics 5(1): 1. 1-26. DOI: https://doi.org/10. 5334/gjgl.1007 In this paper we report on a behavioral study using a translation task, designed to elicit Welsh mutation (a process where the initial consonant of a word changes in different morphosyntactic contexts; more on this below). First, we replicate the effect for English nouns showing that Welsh nouns are more faithful than verbs in initial consonant mutation. We go on to show that place names exhibit an intermediate status between proper nouns and common nouns in terms of mutation. We follow this up with several corpus studies that demonstrate the same effects.
We demonstrate that all of these distinctions are unexpectedly correlated with lexical frequency. Specifically, more frequent items undergo mutation more readily. We then go back to Moreton et al.'s data and show that they are correlated with lexical frequency in the same way. Specifically, more frequent items are more likely to simplify in the blending task. Statistically, once lexical frequency is in the model, there is no need for lexical category.
We attribute our results to a well-known frequency effect whereby reduction or lenition processes apply more readily to more frequent items. This observation goes way back to Hooper (1976) who cites the case of syncope in English, i.e. that syncope applies more readily in high-frequency items like memory [mɛm(ə)ri] vs. low-frequency items like mammary [maem(ə)ri]. A similar observation is made by Fidelholtz (1975) with respect to vowel reduction in English. For example, the corresponding syllable of a relatively high-frequency form like astronomy [əstrˈanəmi] is more likely to reduce than the initial syllable of a relatively low-frequency form like gastronomy [gaestrˈanəmi]. This has been studied more recently by, e.g. Hammond (1999), Hammond (2004), Coetzee (2009), Coetzee & Kawahara (2013), etc.
We thus establish three principal effects: i. There are superficial category effects for Welsh mutation, similar to those of English. ii. The Welsh effects also correlate with lexical frequency. iii. In fact, the English effects correlate with lexical frequency as well.
This suggests that frequency is the guiding force here rather than lexical category per se. (Note that we are not arguing that all of grammar follows from frequency effects, just that target category effects do.) The organization of this paper is as follows. First, we review the facts of Welsh mutation, with particular attention to how it interacts with lexical category. Next, we go on to report the results of our behavioral study, showing how it replicates the noun and proper noun effects noted by Moreton et al. As just described above, these behavioral effects also show an effect of lexical frequency and we next probe this more closely with a series of corpus investigations. In our corpus investigations, we show how mutation is less likely with less frequent forms and we show how the different parts of speech correlate with lexical frequency as we would expect. Specifically, lexical categories with higher lexical frequency undergo mutation more readily. We confirm this frequency effect by looking back at Moreton et al.'s experimental results with respect to English. We then provide a formal analysis showing how the frequency effects we've demonstrated can be incorporated into the grammar. Finally, we conclude with a discussion of how lexical frequency and lexical category can become intertwined as shown.
Welsh has at least three distinct mutations, but we focus on the soft mutation here. Basically, the process can be triggered in two ways. First, various preceding elements induce it. In the following examples, mutation is triggered by the definite article y [ə] when the following noun is feminine singular, the possessive marker dy [də] 'your', the preposition am [am] 'about', the disjunction neu [neɰ] 'or', and the prenominal adjective hen [heːn] 'old'. 2 Examples appear in Table 1.
The examples in Table 1 are all nouns, but the soft mutation applies to other lexical categories as well. Table 2 gives examples of verbs and Table 3 gives examples of adjectives. Many other triggers of soft mutation appear in the language.
Second, the soft mutation is triggered by certain syntactic contexts. For example, the object of an overtly inflected verb undergoes soft mutation. In the example below the direct object cathod [kʰaθɔd] 'cats' does not undergo mutation because the verb gweld [gwɛld] 'see' is not directly inflected. In the present tense, the auxiliary verb bod 'be' marks person and number.
(1) dw I 'n gweld cathod [du i n gwɛld kʰaθɔd] am I prt see cats 'I see cats' Compare this with the following past tense form. Here, the verb is directly inflected for person and number (in the past tense) and the direct object appears in the soft mutation. 3 (2) gwelais i gathod [gwɛlajs i gaθɔd] saw-1sg I cats-soft 'I saw cats' Verbs in certain embedded clauses with an overt subject will also display the soft mutation. In the following example, the verb mynd [mɨnd] 'go' does not undergo soft mutation since there is no overt subject in the embedded clause.
(    Table 4 gives the orthographic and phonetic effects of soft mutation. Other consonants do not change in mutation contexts, e.g. [s, n, v, l, r, ʃ, ʤ, χ, θ, f]. Interestingly, personal (or family) names do not generally undergo mutation. Compare the names in Table 5 with Table 1 above. In fact, some Welsh personal names also exist as common nouns with distinct meanings. This results in minimal pairs in mutation environments depending on whether the word is used with its literal meaning or as a name as in Table 6.
In very rare circumstances, personal names can undergo soft mutation. As a measure of how rare this is, there is not a single example in the CEG corpus, a corpus of written Welsh of over a million words (Ellis et al. 2001). When this does occur, in some cases it seems to correlate with treating the name as if it were a common noun. For example, in  Place names exhibit a more complex pattern. The prescriptive rule is that Welsh place names and certain non-Welsh place names mutate. Other non-Welsh place names do not mutate. All three cases are given in Table 7. Bangor and Conwy are the names of towns in Wales that do mutate. Paris and Califfornia are foreign place names that do mutate. 5 Taiwan and Berlin are place names that do not generally mutate.  Ball & Müller (1992) maintain that non-Welsh place names mutate when they are "considered to be common enough to be brought into the system" (Ball & Müller 1992: 205). Prys (2015) establishes a more general result, demonstrating with corpus data that more frequent place names generally mutate more readily.
Place names are rather sporadic in their mutation and can often go unmutated in mutation contexts in more casual styles. For example, we can also find i Bangor, i Conwy, i Paris, and i California in Twitter data. 6 There are related frequency effects with verbs as well. Stammers (2009) establishes that more frequent verbs occur more frequently in mutation contexts. Stammers & Deuchar (2012) establish that more frequent verbs also mutate more often. 7 We return to this below.
The prescriptive rules thus support the idea that lexical category can affect morphological processes. Specifically, personal names exhibit greater faithfulness by resisting soft mutation. On the other hand, we've seen that place names exhibit a more complex pattern, one that we examine more closely in the following section.

Behavioral experiment
In this section, we describe a behavioral experiment that examines more closely the role of lexical category in the Welsh mutation system. In addition, we examine lexical frequency and hypothesize, following Prys (2015), that it is what is responsible for the distinction above between mutating and non-mutating place names.
In the experiment, subjects were asked to translate very simple English sentences into conversational Welsh. We chose this task because it's been used before in the documentation of Scottish Gaelic (Dorian 1973;Dorian 1978;Dorian 1981;Hammond et al. 2014;Hammond et al. 2017). The logic for this choice is that we wanted a simple method for eliciting intuitions about the contexts for mutation. Translation items were chosen such that subjects would not be able to deduce that we were interested in mutation, as mutation, typically shows a high degree of style-shifting (Prys 2015). Moreover the expected statistical distribution of mutation in our items was essentially equivalent to what's seen in normal Welsh conversation.
For example, one of our prompts was "Dewi went to a new brewery". This was designed to elicit a sentence that would test whether the noun for brewery mutates as expected after the mutating preposition i [i] 'to'. We would expect a response like: Aeth Dewi i fragdy newydd [aɰθ dɛwi i vragdɨ nɛwɨð] went Dewi to brewery new 'Dewi went to a new brewery' Subjects were allowed a lot of latitude in their responses, except for the key parts we were interested in, in the case above, the preposition i and the noun fragdy. If they used different words for those elements, they would be prompted for whether they could say the sentence in another way, using the relevant items. For example, if the subject said o fragdy 'from a brewery' instead, we would ask if they could say 'to a brewery' (in English). Similarly, subjects might code-switch or say they didn't know the word for brewery. We would then offer the item bragdy and ask if they knew it and could use it in the sentence.
We then noted whether the target item, in this case the word for brewery, was mutated fragdy [vragdɨ] or not bragdy [bragdɨ]. We would also note if a prompt was necessary and, if so, whether they then used the desired construction. The experiment was conducted at Bangor University in Bangor, Wales. There were 84 items and 36 subjects. Items were presented in a single pseudo-random order first to last or last to first; half the subjects received the items in one order and the other half saw them in the reversed order. All items are given in the appendix. For subject responses, mutated items are coded as 2; unmutated items are coded as 1.
The experiment was designed to test various factors all designed to tap into the role of lexical category in mutation: i) lexical category of the triggering element, i.e. prepositions vs. adjectives; ii) lexical category of the element undergoing mutation, i.e. common nouns, verbs, and place names; and iii) frequency of place name as targets. In addition, though not relevant to our hypothesis here, triggers were selected so as to vary in terms of whether they ended with a vowel or consonant and mutation targets varied in terms of whether they began with a single consonant or a consonant cluster.
Our omnibus design is not suitable for a single analysis as not all factors interact. We therefore report several separate analyses. We have two random variables, subjects and items, so mixed effects modeling is appropriate. Since the dependent variable, mutation status, is a binary one, the data were analyzed using mixed effects logistic regression (Jaeger 2008). 8 In all of our analyses, we follow the recommendations of Barr et al. (2013) using maximal design-based models with random slopes as appropriate. 9

Lexical category of the trigger
Our first analysis examines the lexical category of the triggering item, specifically whether it is an adjective or a preposition. The means are given in Table 8 and plotted in Figure 1 (where again mutated items are coded as 2 and unmutated items are coded as 1 in both). Mutation is slightly more likely with a preceding adjective than with a preceding preposition. 8 These were performed using the R (R Core Team 2014, version 3.4.3) lme4 package (version 1.1-15). 9 We thus include all random slopes possible given our fixed effects. This also entails that we do not adjust models incrementally in the face of preliminary statistical analyses.  The effect of trigger part of speech is not significant as seen in the second row of Table 9. 10 Based on the facts reviewed in Section 2, we did not anticipate an effect here.

Lexical category of the target
The next analysis is to determine if there is an effect of lexical category in terms of the target of mutation contrasting nouns, verbs, and place names. We used only Welsh place names, ones that the prescriptive rules say should mutate. We see in Table 10 that place names exhibit the least mutation, followed by nouns, and then verbs. This is plotted in Figure 2. With nouns as the reference level, the comparisons with place names and verbs are both significant as seen in rows two and three of Table 11. 11 This factor has three levels, but the relatively low rate of mutation with place names stands out. 10 Here the reference level for trigger part of speech is verb. We provide the R equation for all mixed effects analyses here. The R equation used for this specific analysis is: mut ~ trigger-pos + (1|items) + (1 + trigger-pos|subjects) 11 The R equation used is: mut ~ target-pos + (1|items) + (1 + target-pos|subjects)

Frequency of place names
We saw above that place names exhibit sharply reduced rates of mutation. We sought to probe this further by considering the potential role of lexical frequency. Our items included two classes of Welsh place names: relatively high-frequency items and relatively low-frequency items. See Table 12.
Note that frequency was assessed in terms of northern Welsh speakers. Thus, for example, Tremadog is a fairly small town, but quite well-known in the north. 12 The infrequent places are all small towns in central and southern Wales. All the names are well-formed morphologically and are phonotactically unobjectionable. To make sure that subjects treated them as Welsh, all subjects were informed in advance that the experiment included the names of small towns in south Wales that they might not know. 13 Table 13 shows the rate of mutation for high-and low-frequency place names and how that variable interacts with the lexical category of the mutation trigger (which we already treated on its own in Subsection 3.1 above). This is plotted in Figure 3. Highfrequency place names exhibit a higher rate of mutation than low-frequency place names. Prepositions appear to trigger less mutation than adjectives.
As we see in Table 14, the overall effect of frequency is significant (row 2) but not the effect of the lexical category of the trigger (row 3) or the interaction (row 4). 14 The 12 Frequency in the north was assessed informally by the Welsh-speaking authors who live or have spent time in north Wales. 13 Note that this leaves open the possibility that some subjects may have treated these as nonce forms. This would mean that subjects assumed we were not telling the truth about these being actual places in south Wales. If subjects did do this, then the risk is that subjects might treat nonce forms differently from lowfrequency items, that they are not just the extreme end of low frequency. This, of course, could be tested with a follow-up experiment that presented subjects with more degrees of frequency. Thanks to an anonymous reviewer for drawing this possibility to our attention. 14 Here, the reference level for frequency is high and the reference level for trigger part of speech is adjective.
The R equation used is: mut ~ freq * trigger-pos + (1|items) + (1 + freq * trigger-pos|subjects)  latter are perhaps unsurprising as we saw no main effect of the lexical category of the trigger either. Interestingly, the frequency effect for place names is affected by how many times subjects hear an unfamiliar place name over the course of the experiment. Several of the place names we used were repeated over the course of the experiment with different triggers: Cymru, Bangor, Caerdydd, Cilcennin, Penbryn, and Talsarn. Since we presented the whole experiment in a single pseudo-random order and then in that order reversed, we can examine whether repeating an item increases the likelihood of mutation. In Figure 4, we plot the mean mutation values for place names separated by frequency and whether it was the first, second, or third repetition. Solid lines show high-frequency items and dashed lines show low-frequency items. The two different orders are indicated with color: black lines give one order and red lines give the reversed order. The shape of the lines in general is not itself meaningful here as different triggers were involved. However, the change in the shape of that line under the two presentations is meaningful. We see that the lines shift position as a function of presentation order which tells us that repetition does seem to affect mutation. We also see a difference in how extreme that shift is as a function of frequency so we may see a different effect in the two cases.
Turning now to significance testing, in Table 15 we see significant main effects of order (row 2) and frequency (row 4). 15 We also see a significant interaction between order and presentation (row 5) confirming that the number of times subjects hear a place name has an effect on the likelihood of mutation.
Summarizing, we see three main effects in our behavioral study. First, there is an effect of lexical category with place names mutating the least, followed by nouns, followed by verbs. Second, frequent place names mutate more readily than infrequent place names. 15 Here, the reference level for frequency is low and the reference level for presentation is forward. The R equation used is: mut ~ order + pres + freq + order:pres + order:freq + pres:freq + (1 + pres|items) + (1 + order + freq + order:freq|subjects)  Third, number of repetitions in the experiment also affects mutation such that when a place name is repeated more, it is more likely to mutate.

Corpus analysis
We now turn to corpus data to see if we can make sense of the patterns we saw in our experimental data. Do we see the same effect of part of speech on targets in corpus data that we saw in the experimental data? Do we also see an effect of frequency? And the key question: are these two effects distinct? We will see that, in fact, frequency effects drive the apparent category effects we've seen in our behavioral data. Can we disassociate frequency and part of speech with corpus data? In fact, using data from the CEG corpus (Ellis et al. 2001), these variables are strongly associated. In Table 16 we see the mean counts for verbs, nouns, and place names in the CEG corpus, calculated as the average frequency for all words in each of those categories. This is plotted in Figure 5.
The difference between nouns and verbs is not significant, t(3247.898) = 1.620, p = 0.105, but the difference between verbs and place names is: t(2963.295) = 6.618, p < .001. The difference between nouns and place names is also significant: t(10771.748) = 16.925, p < .001. The upshot of this is that frequency correlates with target part of   speech, such that more frequent items are more likely to undergo mutation, consistent with the general claim that frequency drives the effect, not part-of-speech per se. But is frequency an independent effect from part of speech? We go to data from Twitter now to test this. Twitter is a much more unedited and unfiltered corpus and we can expect to see more variation in the distribution of mutation than in the CEG corpus. The corpus we use contains over 7 million Welsh-language tweets collected over several years (Jones et al. 2015).
To get a sense of how the language of Twitter differs from other sources, here are a few tweets from the beginning of the corpus. Even without translations, you can see that there are a few obvious differences. First, there are bits of text typical for the internet and twitter: URLs, hashtags, responses to another twitter user (indicated with @). Another difference is that there is a fair amount of code switching, which is also fairly typical of the spoken language. Finally, there's a fair amount of slang, misspellings, and non-standard dialect forms.
If we run this as a regression, then log total is significant, part of speech is not, and the interaction is not. This is given in Table 17: R 2 = 0.37, F(5, 29) = 3.39, p = 0.016. This is consistent with the effect being driven by log count rather than by part of speech.
We can also test this with a likelihood ratio test. If we put both part of speech and log total into a model and then drop part of speech, there is no significant effect: X 2 (3) = 4.18, p = 0.12. On the other hand, if we drop log total, then the effect is significant: X 2 (4) = 6.37, p = 0.01. This is consistent with our conclusion that frequency drives the effect. To fully appreciate the relationship, we now drop part of speech from the regression model in Table 17 and plot the regression line for log total against rate of mutation in Figure 8. Here each point represents an individual item showing the effect of log total on the relative frequency of mutation.
In summary, our corpus data also show category effects and frequency effects. Closer analysis shows that frequency is the driving factor and that part of speech does not contribute significantly to the model.

English
Given that we've seen that frequency seems to be a stronger predictor of mutation than part of speech, it's worth looking back at Moreton et al.'s effects and see if there is potentially a frequency effect there as well. In other words, are their effects actually due to lexical category or to frequency?
Recall that Moreton et al. constructed blends that varied in terms of how much of each word appears in the blend. They demonstrate a number of effects with this task including the category effects we explore here. For example, as already noted above, they found that subjects were more inclined to accept sopraning over sopreening when soprano was  interpreted as referring to the TV program The Sopranos than if it referred to a type of singer. Loosely, more of the word is preserved in blending if it is a proper noun than if it is a common noun. This is a different sort of process than consonant mutation, most obviously because the input is comprised of two words. What do we expect if frequency drives the category effects? The most reasonable interpretation would be that more frequent words should play a bigger role in the blend. In other words, more of a frequent form should appear in the ultimate blend form. In the example above, we would expect the TV program interpretation to be more frequent than the singer interpretation. Moreton et al. report several studies. We set aside their studies with respect to constituency, branching, and position of stress. They also report two studies that compare nouns and verbs and two that compare common nouns and proper nouns.
Let's look first at the studies comparing nouns and verbs, specifically their experiments 3a and 3b. Experiment 3b involved blends of either a verb or a noun with another noun. The dependent variable is how much of the first or second word is preserved in the blend. The relevant independent variable is whether the first word is interpreted as a verb or a noun. For example, subjects were asked to judge the acceptability of floatex vs. flatex as a blend of float and latex. Subjects were told that the blend meant either 'latex that is used to waterproof a parade float' (N + N) or 'latex that is light enough to float' (V + N). What they find is that the verbal interpretation biases subjects toward the blend form that preserves less of the verb, i.e. flatex in this case.
To check for a frequency effect, we examined all their experimental items in the first 100 million words of the Wacky corpus (Marco Baroni & Zanchetta 2009). This corpus is useful here because it is extremely large and all words are tagged for part of speech. We can therefore get fairly accurate relative counts for all the items Moreton et al. use. These are given in Table 18. Mean values are given in Table 19 and plotted in Figure 9. Nouns are more frequent than verbs. Since these are count data, they are not normally distributed and we therefore logtransform them. The difference is significant in a paired t-test: t(16) = 2.266, p = 0.038.
It's fair to conclude that Moreton et al.'s results with respect to the distinction between nouns and verbs is consistent with the frequency-based story we've developed here. Their experimental items are more frequent when they are tagged as nouns than when they are tagged as verbs. Hence we expect them to be preserved in blending more when they are nouns.
We can also look at their experiments that involve proper nouns vs. nouns. Items and counts are given in Table 20. Note that their proper noun category includes what we would term place names.   Mean values are given in Table 21 and plotted in Figure 10. Proper nouns are more frequent than nouns. Again, the counts are not normally distributed and we log-transform them. The difference here only trends in a paired t-test: t(17) = -1.830, p = 0.085. 16 To conclude, the blending facts from Moreton et al. (2017) with respect to nouns vs. proper nouns and verbs vs. nouns are consistent with the frequency story developed here.

Formalizing the role of frequency
In this section, we propose an account of these facts within a version of Optimality Theory (McCarthy & Prince 1993;Prince & Smolensky 1993) that makes use of lexically-conditioned constraints (Hammond 1999;Pater 2000) and weighted constraints as in Harmonic Grammar (Smolensky 2006;Pater 2009;Potts et al. 2010). 17 We'll need lexically-conditioned constraints to capture the fact that lexical items behave differently. We'll need weighted constraints to capture trade-offs and the gradient nature of the system.
We have seen that frequency plays a significant role in the distribution of mutation in Welsh and in the pattern of blending in English. In the case of Welsh, mutation is more likely with more frequent forms. In the case of English, retention of material in blends is greater with more frequent forms.
These effects seem to go in opposite directions. In the case of Welsh, more frequent forms are less likely to be preserved, because they are more likely to be mutated. Thus Cymru [kʰəmrɨ] 'Wales' is more likely to mutate than Cribyn [kʰrɪbɨn] 'Cribyn'. In the case of English, more frequent forms are more likely to be preserved in blends. Thus sopraning is more like to be preferred over sopreening when soprano refers to the more frequent name of the TV program, than to a type of singing voice.
This argues against a treatment in terms of lexical faithfulness in Optimality Theory. The basic idea is that there would be separate faithfulness constraints for individual lexical items. These constraints are ranked in terms of the frequency of the lexical item. Thus relatively infrequent items would have high-ranked faithfulness constraints, while 16 We use the term "trend" to refer to a p-value less than .1 and greater than .05. 17 The account could also be tweaked for Maxent modeling (Hayes & Wilson 2008)    common proper 0 200 500 more frequent items would have low-ranked faithfulness constraints. We can schematize this as in Figure 11.
In the case of Welsh, Cribyn resists mutation because its lexical faithfulness constraint outranks the pressure to mutate; Cymru mutates more readily because its faithfulness constraint is outranked by the pressure to mutate. In Welsh, the faithfulness constraint corresponding to the less frequent form is the higher-ranked. In the case of English, sopraning is preferred to sopreening for Sopranos because the corresponding faithfulness constraint outranks the pressure to blend. Here the faithfulness constraint corresponding to the more frequent form is higher-ranked.
If the ranking is to be a consistent consequence of lexical frequency, we must rule out an account in terms of lexical faithfulness. Instead, we develop an account in terms of surface correspondence (McCarthy & Prince 1995). The basic idea is that we have correspondence constraints with respect to surface forms and these are weighted with respect to the grammatical constraints of the two systems.
In the case of English, the system is unchanged: high-ranked correspondence to more frequent forms cause them to be more preserved in blends. In the case of Welsh though, we have correspondence constraints for both mutated and non-mutated forms where the more frequent the form is, the higher-ranked the corresponding correspondence constraint is. In the case of all Welsh words, the correspondence constraint for un-mutated forms must have a greater weight than that for mutated forms to capture the fact that in the absence of the pressure to mutate, the form surfaces as unmutated. The difference between forms like Cribyn and forms like Cymru is that in the latter case, the correspondence constraint for Gymru is ranked high enough to sometimes tip the balance.
For Cribyn in non-mutation context, we would have something like Table 22. Here we've provided weights that capture the intuitions expressed above. The winning pronunciation is the one with the lowest weighted sum of violations. The correspondence constraint for Cribyn is stronger than that of Gribyn. What we see then is that in a non-mutation context, it is better to not mutate.
For a low-frequency item like Cribyn, it's also better not to mutate in a mutation context as in Table 23.   For a high-frequency item like Cymru, the key difference is that the weight for the correspondence constraint for Gymru is higher. This has no effect in non-mutation context as in Table 24.
Finally, we see the need for finite constraint weights as in Harmonic OT when we consider Cymru in mutation context as in Table 25. In this case, the weight of the correspondence constraint for Gymru when added to the weight of the general mutation constraint is sufficient to force mutation.
We've used specific weights above to get the effects desired, but other weights are possible. For this story to go through, there are two constraints on the weights. First, the weight for the unmutated form must exceed the weight for the mutated form. This corresponds to the fact that unmutated forms are generally more frequent than mutated forms and guarantees that the unmutated form will show up in the absence of mutation. The second property that must hold is that, for mutating forms like Cymru, the weight of the correspondence constraint for Gymru must be greater than the difference between the weights for the unmutated form and the general mutation constraint.
The account makes several interesting predictions. First, what happens if the weight for the mutated form should exceed the weight for the unmutated form? In such a case, the unmutated form will never show up. This is effectively reanalysis. This, in fact, seems to be happening for some speakers with respect to the item tref [tʰrɛ (v) i. RT @BethanWalkling: #hoffdafarn @ClwbYBont clwb y bont-lle fach llawn pobol hyfryd, dihangfa fach cymreig mewn dre eitha seisnigedd! #j … ii. @idrischarles theatr dda iawn mewn dref llawn siopau bach gwych. iii. @gaitoms @dylmei fysa, wast fel just ty tafarn arall mewn dref hefo gormod o rhai gwag yn barod! iv. Caernarfon Hanesyddol: y Rhufeiniaid: Yn amlwg mi ges i fy magu mewn dre llawn o hanes gyda raenau o wahanol g… https://t.co/3jQLCXtcJj v. Parti/gig chweched wedi ei gyhoeddi am Hydref 30ain yn The Scene, mewn dre.
Mynediad £3 ar y drws, croeso cynnes i bawb xxxxx 18 This general phenomenon has been noted before. See, for example, Thomas (1984).  vi. @gaitoms @dylmei fysa, wast fel just ty tafarn arall mewn dref hefo gormod o rhai gwag yn barod! vii. Dwi'n byw mewn dre bach del. http://t.co/pCjbKufjgX Notice that a side effect of this analysis is that the actual input form stays the same, so that while the apparently mutated form occurs in non-mutation contexts, the same form occurs in mutation contexts. In other words, this reweighting of constraints makes the correct prediction that we do not see superficially doubly mutated forms like ddre(f) [ðrɛ(v)]. Indeed, there are no such forms in our Twitter data. 19 Another prediction made by this account is that the frequency effect is driven not by the overall frequency of the word, but by the frequency of the mutated form. This, in fact, is testable with our corpus data. If we go back to our Twitter data and do a simple regression from log total to mutation rate, we get a significant effect as in Table 26: R 2 = 0.26, F(1,33) = 11.35, p = 0.002. 20 However, if we do a regression from the log count of mutated forms only, as in Table 27, the result is still significant, but we get a much higher R 2 : R 2 = 0.47, F(1, 33) = 29.66, p < .001. The greater R 2 for the second analysis supports the formal analysis we've proposed above.
We've seen that we can implement the role of frequency using well-established tools in the grammatical sphere: constraint weighting and lexical constraints. This formal analysis is not inextricably tied to the particular formal system we've used however. It would be possible to express the same ideas using other constraint-based formalisms, e.g. Maxent modeling, Stochastic OT, or Noisy Harmonic Grammar.
Setting aside the formal system, however, the analysis is quite intuitive: the likelihood of mutating a form depends on how often we've heard the mutated form itself.

Conclusion
Summarizing, we saw in our behavioral study that Welsh mutation is indeed subject to lexical category effects. We saw that lexical category effects extend beyond major categories like nouns and verbs, and beyond proper names, to also include place names. We saw that there are also frequency effects.
In our corpus study, we replicated these category and frequency effects. We also saw that the category effects do not contribute significantly beyond their role in frequency effects. 19 Thanks to an anonymous reviewer for very helpful disucssion here. 20 In this and the following analysis we do add-one smoothing on the independent measure to avoid taking the log of 0.  We then turned to the noun vs. verb and proper noun vs. noun distinctions treated in Moreton et al. (2017) and saw that, given the specific items used in the relevant experiments, those results are also consistent with a frequency effect.
We can now hypothesize that similar lexical category effects others have seen also correlate with frequency effects. Smith (2011) observed that category effects are not always the same. In some languages nouns are more faithful than verbs and in other languages the reverse is true. We hypothesize that this occurs when lexical frequency relationships reverse as well. Our conclusion that lexical frequency drives the category effects of Welsh thus provides a potential solution to this previously unexplained aspect of the category-based treatment.
Note that we are not claiming that all grammatical effects follow from frequency: our analysis deals only with target part-of-speech effects. We are also not claiming that all category effects necessarily derive from frequency effects. The mutation and blending effects treated here both involve degree of application or whether some process applies and it is straightforward to see this in terms of frequency of the relevant targets. Other category effects, like nominal vs. verbal stress in English (Chomsky & Halle 1968;Hayes 1981;Hayes 1995), are difficult to see in these terms. We hypothesize that category effects like these are not due to frequency.
Finally, a treatment in terms of lexical frequency makes good sense theoretically. This accords with the general principle that high-frequency items participate in the grammar of a language more fully than low-frequency items. We've shown how it is possible to implement the central intuition of the analysis using constraint weighting and lexical constraints. We've also shown how the particular formal analysis we developed makes additional correct predictions.