Diachronic phonological asymmetries and the variable stability of synchronic contrast

This article aims to understand the development of diachronic asymmetries in phonological systems by evaluating the variability stability of synchronic contrasts. We focus on sonorant systems involving secondary palatalisation, grounded in the claim that palatalised laterals are more common than palatalised rhotics cross-linguistically. Our analysis reports acoustic and articulatory data on Scottish Gaelic, a Celtic language with a large sonorant inventory contrasting palatalised, plain and velarised phonemes across laterals, nasals and rhotics. We summarise high-dimensional dynamic characteristics of the acoustic spectrum and midsagittal tongue shape using a two-stage data reduction process and use these coef ﬁ cients as inputs for training a Support Vector Machine. This trained model classi ﬁ es unseen data in terms of its phonemic identity, which reveals that rhotics are classi ﬁ ed best word-initially and worst word-ﬁ nally, with nasals always classi ﬁ ed better than laterals. We ﬁ nd that dynamic information substantially improves acoustic classi ﬁ cation, but only improves articulatory classi ﬁ cation for some sono-rants. We propose that the variable synchronic stability of palatalisation contrasts complicates potential trajectories of diachronic change in Gaelic.


Introduction
In this article, we investigate whether diachronic and typological asymmetries in phonological systems are reflected in the variable stability of synchronic contrasts.It is widely predicted that the diachronic instability of some phonological contrasts is a consequence of a larger pool of synchronic variability.This is because such variability is hypothesised to facilitate misperception-based sound change (Ohala, 1981) and can also weaken the robustness of phonemic categories, leading to potential neutralisation over time (Bybee, 2015).But does the propensity of a phonological contrast towards diachronic neutralisation necessarily mean that it will be less robust at a given point in time?An assumption underpinning many theories of sound change is that we can observe the tendencies of diachronic change through examination of synchronic data, with the hypothesis that there is a tight link between the two at any point in time (Labov, 1994, 21).This suggests that a greater tendency towards diachronic neutralisation should also be evident in synchronic data.In this study, we examine claims about the diachronic trajectories of typologically unusual sound systems and whether the variable stability of synchronic contrasts is predictable from the attested sound changes.We also speculate on whether variable synchronic stability between phonological categories might be able to tell us something about future trajectories of sound change, especially in light of existing diachronic predictions.
A particularly good case study for examining variable diachronic and synchronic stability is the cross-linguistic system of contrasts that fall under the banner of secondary palatalisation.Previous research shows that some secondary palatalisation contrasts in consonants are more unstable than others (Kochetov, 2005;Iskarous & Kavitskaya, 2018).Palatalised rhotics, in particular, are cross-linguistically rare and prone to merger with non-palatalised rhotics (Hall, 2000), but laterals seem more robust to sound change (Iskarous & Kavitskaya, 2010).Word-final palatalisation contrasts are also more unstable than word-initial contrasts (Padgett & Ní Chiosáin, 2018).Importantly, previous work shows that the robustness of palatalisation contrasts may vary depending on the features analysed; for example, nasals may be more distinctive than laterals in format transitions, but laterals have a more distinctive spectral shape than nasals, with rhotics being least distinct in both analyses (Iskarous & Kavitskaya, 2018).This suggests that palatalisation contrasts are multi-dimensional and temporally distributed, potentially as a consequence of a high number of phonological categories existing together in a relatively narrow phonetic space.
The fact that some sonorant contrasts are diachronically less stable than others cross-linguistically makes them an ideal candidate for assessing claims about variable diachronic trajectories using synchronic data.In this study we wish to further understand why some sonorants show greater stability than others and, in doing so, we focus on palatalisation contrasts in Scottish Gaelic (Celtic), which contrasts palatalised, velarised and plain sonorants across laterals, rhotics and nasals.Notably, Scottish Gaelic has retained a larger system of sonorants in comparison to closely-related Irish and Manx.In this study, we take seriously the dynamic nature of sonorant contrasts, building upon our previous work that has focused on selective sampling of a limited number of timepoints.We show that this previous research may underestimate the extent of contrast that is present in the Scottish Gaelic sonorant system; contrasts which we argue are fundamentally dynamic in nature.We further demonstrate this by comparison with analyses that focus only on a sonorant 'steady-state', which illustrates how some contrasts may be more dynamic in nature than others.

Dynamics of secondary palatalisation
Secondary palatalisation involves overlap between a palatal gesture and the consonant's primary place of articulation, which contrasts with 'full palatalisation', where the consonant's primary place of articulation is changed (Bateman, 2007, 2).Some languages, such as Russian and Scottish Gaelic, have extensive secondary palatalisation contrasts across the consonant system, such that almost every consonant has a palatalised and non-palatalised counterpart (see Yanushevskaya & Bunčić, 2015 for description of Russian, and Nance & Ó Maolalaigh, 2021 for description of Scottish Gaelic).For this reason, all consonant palatalisation pairs in Russian, Scottish Gaelic and other languages with this system are considered to contrast in secondary palatalisation even though the secondary palatalisation contrast may at times manifest as a change in primary place/manner. 1n terms of articulation, the most widely reported articulatory correlate of secondarily palatalised consonants is tongue body fronting and raising towards the palate accompanying the primary consonantal gesture (Kochetov, 2002;Stoll, 2017;Bennett, Ní Chiosáin, Padgett, & McGuire, 2018;Malmi & Lippus, 2019;Spinu, Percival, & Kochetov, 2019).The fronting and raising gesture also frequently extends into the surrounding vowels (Malmi & Lippus, 2019).This tongue body fronting is often accompanied by tongue root advancement and pharyngeal expansion (Kavitskaya, Iskarous, Noiray, & Proctor, 2009;Bennett et al., 2018), while palatographic studies additionally demonstrate that the tongue blade is spread across the hard palate to a greater extent than in non-palatalised consonants (Farnetani et al., 1991;Meister & Werner, 2015).
Secondary palatalisation is a good case study for testing the relationship between diachronic neutralisation and synchronic stability, because of well-documented differences between sonorant types.Palatalised rhotics involve a retracted and stabilised tongue body for trill production (McGowan, 1992;Recasens, 2013), which comes into conflict with the tongue body advancement needed for palatalisation (Iskarous & Kavitskaya, 2018;Kochetov, 2005;Stoll, 2017).Such biomechanical constraints may lead to a larger pool of synchronic variability (Ohala, 1989), with the possibility that variants become phonologised or contrasts are neutralised over time (Beckman, De Jong, Jun, & Lee, 1992;Bybee, 2015).For example, articulatory variability may lead to ambiguity in perception, which could advance the spread of a change further when misperceived by the listener (Ohala, 1981).Such explanations are explicitly pursued in previous research on sonorant palatalisation in terms of acoustics (Iskarous & Kavitskaya, 2018) and articulation (Kochetov, 2005;Stoll, 2017), with the claim in both cases being that less robust phonemic categories are more susceptible to merger.

Palatalisation in Gaelic
Our study focuses on Scottish Gaelic, a Celtic language closely related to Irish and Manx. 2 The Scottish Gaelic language is usually referred to in English by its speakers simply as 'Gaelic' / galɪk/ and we refer to it as Gaelic henceforth.Together, the Celtic language sub-family consisting of Gaelic, Irish and Manx is known as 'Goidelic'.The most recently available data (Scottish Government, 2015) show that there are approximately 57,600 Gaelic speakers in Scotland.Traditionally, Gaelic is associated with the north-west Highlands and Islands of Scotland, and this is where the most densely concentrated populations of Gaelic speakers live.In particular, Gaelic is associated with the chain of islands off the north-west coast of Scotland known as the Outer Hebrides or Western Isles, where around 60% of the population reported the ability to speak Gaelic (Scottish Government, 2015).A map showing the concentration of Gaelic speakers in Scotland is in Fig. 1.The speakers in this study are from the Isle of Lewis, the most northerly island in the Outer Hebrides chain.The Goidelic languages are descended from Old Irish, which expanded from Ireland to Scotland and Isle of Man in early medieval times (McLeod, 2020).It is generally thought that Gaelic in Scotland had sufficiently diverged from Irish to be considered a separate language in approximately 1100 CE (Ó Maolalaigh, 2008).
The Goidelic languages all have systems of contrastive secondary palatalisation across the entire consonant system (with a few exceptions in some consonants) (Broderick, 2009;Hickey, 2014;Bennett et al., 2018;Nance & Ó Maolalaigh, 2021).In Nance and Kirkham (2022), we provide a historical overview of the development of palatalisation in rhotics and comparison to different Goidelic dialects.In this paper, we focus on the contrasts across the whole sonorant system.To summarise: the most extensive Goidelic palatalisation contrasts were found in Old Irish, where the system developed by approximately 900 CE (Greene, 1973;Hickey, 1995).At this time, Old Irish sonorants contrasted in place of articulation as well as palatalisation, resulting in four different phonemes for laterals, nasals and rhotics (Thurneysen, 1946;Russell, 1995;Hickey, 1995).It is thought that a three-way contrast between palatalised, plain and velarised sonorants developed in Middle Irish (900-1200 CE) (Hickey, 1995).The Irish system has evolved since early medieval times in different ways in the modern Goidelic dialects.The most innovative dialect in this respect is Manx, where palatalisation contrasts were lost in rhotics, and reduced in laterals and nasals.At the other end of the scale are Hebridean dialects of Gaelic, including the dialect under investigation here, Lewis Gaelic.In Lewis and other Hebridean dialects, three lateral, three nasal and three rhotic phonemes are maintained.
In comparison to many of the previous studies of palatalisation, Lewis Gaelic is interesting in several respects.The majority of work carried out previously on palatalisation has examined contexts where palatalised consonants are contrasted with non-palatalised consonants, such as Russian.In Gaelic sonorants there is instead a three-way distinction between palatalised, plain and velarised.The rhotic inventory, however, has been particularly prone to reduction across Goidelic dialects, with laterals appearing most robust to sound change.This is in line with the findings discussed above for Slavic, which show that rhotics are more susceptible to change than laterals (Carlton, 1990;Iskarous & Kavitskaya, 2018).

Summary and predictions
In the current study, we investigate the extent to which palatalisation contrasts are maintained, combining dynamic phonetic evidence from acoustics and articulation in order to examine whether phonemic distinctiveness varies between laterals, nasals and rhotics.We specifically build upon previous work in the following ways.First, previous work on the asymmetry of sonorant palatalisation contrasts has focused on Russian as the language with the most extensive system of sonorant palatalisation in the Slavic family (Kochetov, 2005;Stoll, 2017;Iskarous & Kavitskaya, 2018).Here, we consider Lewis Gaelic, as the Goidelic dialect with the most extensive system of sonorant palatalisation in a completely different language family.Second, previous work in this area has considered articulation (Kochetov, 2005;Stoll, 2017) or acoustics (Iskarous & Kavitskaya, 2018) respectively, but we combine both perspectives and use a method that allows us to subject each modality to a comparable classification task.Third, much previous work has focused on static timepoints, either sonorant midpoints or specific locations of formant transitions.We take a broader approach by compressing all time-varying information that is available in the signal and using this to assess classification accuracy.This allows us to more comprehensively investigate the hypothesis that diachronically unstable contrasts are more vulnerable to synchronic neutralisation at a specific snapshot in time.Accordingly, we set out the following questions for the present study: 1. Which sonorant categories (laterals, nasals, rhotics) show the most robust phonemic contrasts?2. Is contrast more robust in acoustic or articulatory data? 3. How do acoustic and articulatory dynamics contribute to phonological contrast?4. What do these results tell us about the variable synchronic stability of categories and the potential diachrony of palatalisation contrasts?
We test the prediction that laterals will be best classified, followed by nasals and then rhotics, and anticipate that reduction will be more evident word-finally.In previous work on Gaelic, Nance and Kirkham (2020) show that laterals are more robust than nasals in formants at the sonorant steady-state, while Nance and Kirkham (2022) show that three initial rhotics are well-maintained in Gaelic, despite potential neutralisation of rhotics in word-final position.However, these studies used different methods and different features to establish contrast, as well as focusing on a small set of selective timepoints, so our present study uses a more holistic and comparable method for establishing the relative robustness of three-way contrasts across laterals, nasals and rhotics.

Speakers
We recorded data from twelve L1 speakers of Lewis Gaelic, all of whom were raised in Gaelic-speaking families on the Isle of Lewis (six female, six male).They acquired English either as simultaneous bilinguals or upon entering the school system.The speakers were aged 21-80 and either used Gaelic as part of their job, or had used Gaelic before retirement.All speakers reported using more Gaelic than English in their daily lives and can be considered Gaelic-dominant bilinguals.Due to the fragility of Gaelic language transmission, even in locations such as Lewis (Munro, Taylor, & Armstrong, 2011), it is difficult to obtain a large sample of data from Gaelic-dominant bilingual speakers.We recognise that the data here represent a large age range, but the speakers are socially consistent in using more Gaelic than English.

Data recording and stimuli
Simultaneous acoustic and ultrasound tongue imaging data were recorded in a community centre or at the speaker's workplace.The acoustic signal was recorded using a Beyerdynamic Opus 55 headset microphone, which was preamplified and digitized using a Sound Devices USBPre2 audio interface at 44.1 kHz with 16-bit quantization.Simultaneous ultrasound data were recorded using a Telemed MicrUs system, with a 64 element probe of 20 mm radius.We used a 2 MHz probe frequency, 80 mm depth, 90% field of view and 57 scan lines, which resulted in a frame rate of $92 Hz.The probe was stabilised using an Articulate Instruments metal headset (Articulate Instruments, 2008).The occlusal plane for each speaker was imaged by them biting on a bite plate placed behind the upper incisors and pushing their tongue up against it.Synchronization between audio and ultrasound data was achieved using the frame-level TTL pulse emitted by the ultrasound scanner.Data presentation and recording was handled using the Articulate Assistant Advanced software (Articulate Instruments, 2018).
The stimuli used for this study are shown in the Appendix (Tables 8-10).We aimed to capture laterals, nasals and rhotics in word-initial and word-final position in three vowel contexts where possible: /i a u/.This was not always possible due to the historical development of palatalisation in high front vowels.For example, there are no velarised nasals in the context of /i/ in readily-known words.The plain sonorants developed from contexts of historical lenition, and in word-initial position they still occur in contemporary lenition contexts.For an over- view of changes in lenition (contemporary morphophonological changes in Celtic language word-initial consonants known as 'mutation'), see Ball and Müller (2009) or Nance and Ó Maolalaigh (2021) for Gaelic specifically.For this reason we included the word-initial plain sonorants in short phrases that would trigger mutatione.g.mo nathair 'my snake'where the possessive mo 'my' triggers mutation.

Data preparation
Acoustic landmarks were labelled manually in Praat using information from the waveform and spectrogram (Boersma & Weenink, 2020).We labelled the entire sonorant-vowel interval for all tokens, such as lateral-vowel for word-initial tokens and vowel-lateral for word final tokens.This interval was used for all analyses reported in this paper.We carried out post hoc screening of the ultrasound data and found that only seven of the twelve speakers had consistently good images (three female, four male).As our analysis below is premised upon comparing acoustic and articulatory data, we only use these seven speakers for the analysis, resulting in 1165 tokens with parallel acoustic and ultrasound data.

Acoustic features
The acoustic features used in this analysis are Mel Frequency Cepstral Coefficients, which are highly effective at reducing the dimensionality of the spectrum while retaining linguistically-relevant features (Davis & Mermelstein, 1980).MFCCs are directly related to characteristics of the spectrum and, therefore, do have a physical interpretation, despite their complexity in the higher coefficients.For example, lower MFCCs describe global aspects of spectral shape, while increasingly higher coefficients describe increasingly finer details in the spectrum.MFCCs have previously been shown to capture phonemic palatalisation contrasts with a high degree of accuracy (Spinu et al., 2012;Spinu & Lilley, 2016;Spinu et al., 2018).
We use 6 MFCCs to summarise the acoustic spectrum, which has previously been shown to be sufficient for capturing palatalisation contrasts (Spinu et al., 2018).We sensitivity tested the effects of between 4 and 13 MFCCs and found that 6 MFCCs resulted in the strongest overall classification accuracy, although some specific models showed a small (2-4%) improvement using 8 coefficients, after which no further improvement was evident.Accordingly, for each token, 6element MFCC vectors were calculated across each sound file using a 25 ms window and 10 ms frame shift, with a preemphasis coefficient a = 0.97 and a lifter exponent of 0.6.
MFCCs were subsequently extracted at 11 equally spaced points across the labelled sonorant-vowel interval and each MFCC was by-speaker normalized using z-scoring.At this stage, each token is represented by 6 MFCC trajectories, each of which is sampled over 11 points.

Articulatory features
Splines were automatically fitted to the midsagittal ultrasound data using AAA's batch fit function.A paid research assistant manually checked and corrected any obvious errors in the splines, but we did not correct minor tracking errors.
All splines were then rotated and scaled to the occlusal plane.These data comprise 42 values in 2-dimensional x/y space.In order to reduce the dimensionality of the tongue splines, we fitted a Discrete Cosine Transform (DCT) to each token at 11 propotionally-spaced timepoints across the sonorant-vowel or vowel-sonorant interval.The DCT has been used for summarising whole acoustic spectra (Harrington, 2010;Nossair & Zahorian, 1991), formant trajectories (Watson & Harrington, 1999) and articulatory time series (Shaw & Kawahara, 2018) and is conceptually extendable to spatial representations, such as the ultrasound tongue spline.To this end, the ultrasound-DCT is conceptually comparable with MFCCs, as both sets of features fundamentally represent the amplitudes of cosine waves fitted to the respective signals after undergoing transformation.The DCT coefficients have a physical interpretation, with the lower coefficients being proportional to the mean (C 0 ), slope (C 1 ) and curvature (C 2 ) of the tongue shape, with higher coefficients representing increasingly finer detail in the shape.We fit a DCT of the form described in Harrington (2010) with m coefficients to a signal xðnÞ with length N, where the mth coefficient C m is calculated using (1).
We illustrate DCT compression of ultrasound tongue shapes in Fig. 2, which represent smoothing using different numbers of DCT coefficients (between 2 and 10 coefficients) on a single token.We obtained the smoothed tongue shapes using an inverse DCT, which reconstructs the input signal by summing half-cycle cosine waves with the amplitudes of the corresponding DCT coefficients.The figure shows us that two coefficients {C 0 ; C 1 } approximates the slope of the tongue, while using between three {C 0 ; C 1 ; C 2 } and five {C 0 ; C 1 , . .., C 4 } produces similar tongue shapes.At 6 DCT coefficients {C 0 ; C 1 , . .., C 5 } the slight dip between the tongue tip and dorsum starts to appear, which is present in the original signal.After this, we see an increasing level of detail, but not necessarily any strikingly new information in the signal.
In order to empirically evaluate the number of DCT coefficients needed to summarise each tongue shape, we fitted DCTs to all tongue splines (11 per token, representing 11 time-points) with different numbers of coefficients, ranging from 2 coefficients to 10 coefficients, which gives us 9 different options to evaluate.We then conducted an inverse DCT in order to reconstruct the original signal from these coefficients, which essentially gives us a DCT-smoothed version of the original signal.Following Shaw and Kawahara (2018), we then calculate Pearson's correlation between the original signal and the DCT-reconstructed signal and plot these correlation values for different numbers of DCT coefficients.Fig. 3 shows that 3 coefficients yields correlations with the original signal of r > .95for all speakers.As shown above, however, there are some advantages to the higher DCT coefficients, particularly for more complex tongue tip shapes.To this end, we ran testing using the same classification analysis that we report later in this paper, examining the effects of between 4-8 DCT coefficients on classification accuracy for each sonorant*position.Laterals and nasals did not benefit from more than 5 coeffi-cients, but the inclusion of a 6th DCT coefficient improved word-initial rhotic classification by almost 10%.We anticipate that this is because it captures the subtle tongue tip shaping depicted in Fig. 2.After settling on 6 DCT coefficients, we normalized each coefficient by z-scoring each speaker's data across all productions.

Summarising high-dimensional dynamic information
At this point, the acoustic data are represented by 6 MFCC trajectories sampled at 11 points in time (= 66 points), and the ultrasound spline data are represented by 6 DCT trajectories sampled at 11 points in time (= 66 points).This already represents considerable dimensionality reduction from a timevarying power spectrum or time-varying ultrasound spline, but we conducted further dimensionality reduction of the dynamic data using an approach inspired by Nossair and Zahorian (1991).This involves fitting a Discrete Cosine Transformation (DCT) to each of the time-varying MFCC (acoustics) and DCT (ultrasound) coefficients discussed above, which allows us to summarise the shape of each of those coefficient trajectories over time (see Marin, Pouplier, & Harrington, 2010 for a similar approach to spectral data).This provides a higherlevel set of coefficients that encode the shape of each timevarying MFCC or DCT coefficient, each of which summarises some dynamic aspect of spectral shape or tongue shape.
We empirically evaluated the number of DCT coefficients needed to summarise each trajectory in the same way as for the ultrasound spline fitting, which is plotted in Fig. 4. We find that 3 DCT coefficients returns correlations of r > .9 for all acoustic-MFCC trajectories and r > .95for ultrasound-DCT trajectories, except for the 6th coefficient in both sets (MFCC6 and DCT5), which are slightly below these values.However, the MFCC/DCT trajectories are not always smooth functions of time and we avoid seeking higher correlations as we wish to avoid overfitting to the signal.Accordingly, we choose 3 DCT coefficients to represent both sets of trajectories, which captures the mean, slope and curvature of each coefficient trajectory over time.This means that each of the 6 acoustic-MFCC and 6 ultrasound-DCT dynamic trajectories is summarised by 3 DCT coefficients.As a result, each token's time-varying acoustic spectrum or ultrasound tongue spline across the sonorantvowel interval is represented by 18 (6 Â 3) values.
In summary, our final inputs to our model are as follows.We have compressed a complex power spectrum sampled at 11 points in time for each token to 18 values.These values are a compressed representation of how the spectrum changes over the sonorant-vowel interval.We have also compressed time-varying ultrasound tongue splines sampled at 11 points in time for each token to 18 values, which represents how midsagittal tongue shape changes over the sonorant-vowel interval.These compressed representations correlate well with the original signals and should, therefore, capture important information in the original signals.We now turn to the details of the classification analysis.

Classification analysis
We use support vector machines (SVMs) in order to establish how robustly the three-way phonemic contrast can be classified for each sonorant, based on an initial training phase mapping phonological categories to acoustic and articulatory feature sets.SVMs are a class of supervised statistical learning models that aim to find the hyperplane that maximally separates two classes in N-dimensional space (Boser, Guyon, & Vapnik, 1992;James, Witten, Hastie, & Tibshirani, 2013).The hyperplane is located at the maximum margin, which is the largest difference between data points of the two classes.Non-linear separation between classes is typically achieved via a kernel, whereby the data are transformed into a higherdimensional space and linear classification is then performed in this high-dimensional space.SVMs are a binary classification method but multi-class classification can be achieved in various ways.The method we use is the one-against-one technique, in which each category is compared against one other category.This process is repeated for all combinations of categories, with each classifier voting for one category and the category with the highest number of votes being classified accordingly.SVMs have been widely applied to speech data (Clarkson & Moreno, 1999;Wang, Green, Samal, & Yunusova, 2013;Yu, 2017) and are typically reported to show good phoneme classification accuracy on acoustic and articulatory signals.One reason for this is that SVMs are concerned with the margins between classes, rather than the mean and variance of each class, meaning that a larger data set is better only insofar as the additional data better represents the boundaries between classes.
Models were fitted using the e1071 package in R (Meyer, Dimitriadou, Hornik, Weingessel, & Leisch, 2021).We fitted separate models to each combination of sonorant type and position, such as word-initial laterals, word-final laterals, word-initial nasals, etc.Each model had phoneme as the outcome variable and the 18 dynamic acoustic features or the 18 dynamic ultrasound features as the predictor variables.Each feature set was randomly split into 80% training and 20% testing subsets.All models were fitted using a radial basis function kernel, and parameter tuning for each model was conducted on the training data only using a grid search over a range of values for c = {10 À6 ; 10 À5 , . .., 10 À1 } and C = f0:1; 1; 10g, with model performance evaluated using 10-fold cross-validation.The model with the optimal parameters was used to predict the phonemic identity of the 20% test data set based only on the input measurements (with separate models for acoustic and ultrasound data).In order to mitigate against splitting a small data set, we used Monte Carlo cross-validation (Picard & Cook, 1984;Kuhn & Johnson, 2013), which involved running 100 iterations of the train-test procedure for each model, using a different random train-test split each time.We then averaged over the 100 iterations to produce a final classification matrix and overall classification rates. 3All code and data used for analyses in this paper is available at: https://osf.io/dfe7g/. 43 In order to empirically determine the chance classification rate for a data set comparable in size and dimensionality to the models used here, we generated simulated data with 18 numerical variables corresponding to the 18 MFCC/DCT coefficients, each of which was populated with random values from a normal distribution Nð0; 1Þ and then each observation was randomly assigned one of three phoneme labels (plain, palatalised, velarised).We then ran the same procedure described above for the real models and found an average overall classification rate of 31.5-36.94%on random data, depending on the same size, which is close to the theoretical chance level of 33._ 3% for three-way classification.
4 Sensitivity testing and initial modelling was carried out using Lancaster University's High End Computing facility, after which final models were fitted locally for the publicly available documentation.

Laterals
The lateral acoustic model in Table 1 shows overall classification rates of 74.46% (initial) and 81.27% (final), which represents well above chance classification.The classification matrix for initial laterals shows that /l ̪ j / is the most accurately classified at 78.59%, while /l/ is the worst at 63.54%.Note that the majority of inaccurate classifications for /l ̪ j / in initial and final context are as /l ̪ ɣ /, suggesting some overlap in the correlates of velarised and palatalised lateral phonemes.Classification for word-final laterals is better than initial laterals, and word-final /l ̪ ɣ / is the most accurately classified phoneme at 91.77%.Overall, this suggests that initial and final laterals have broadly similar classification rates, with the palatalised and velarised phoneme being most distinct initially and the velarised phoneme being most distinct finally.
The lateral ultrasound model in Table 2 shows overall classification rates of 73.37% (initial) and 83.04% (final), but these statistics particularly obscure considerable between-phoneme differences in classification, suggesting slightly more robust lateral contrasts in the ultrasound data.In word-initial context, /l ̪ ɣ / shows rather poor classification of 59.03%, with 31.79% of productions being misclassified as /l ̪ j /.Outside of this phoneme, the other phonemes are classified better than the acoustic MFCC data.This is also true for word-final laterals, except for /l ̪ ɣ / being slightly better classified in the acoustic data (91.77%vs 89.77%).
In summary, the laterals data show variability in classification, but with slightly better classification in word-final context and substantially above-chance classification in all cases.The models show that /l ̪ ɣ / and /l ̪ j / are most often misclassified as each other and only very rarely as /l/.This suggests that while velarised and palatalised laterals do have some distinctive acoustic and articulatory correlates, there is a reasonable amount of overlap in these categories, which leads to occasional misclassification.The acoustic and articulatory data show relatively similar findings, except for substantially poorer classification for initial /l ̪ ɣ / in the ultrasound data.

Nasals
The nasal acoustic model in Table 3 shows overall classification of 86.67% (initial) and 85.53% (final), which is higher than for laterals.Our previous work has reported less robust distinctions between nasal phonemes in Gaelic (Nance & Kirkham, 2020), but that analysis did not take formant transitions or acoustic dynamics into account.Indeed, our present analysis suggests that such dynamics are crucial to this contrast, and fitting comparable SVMs to a single time-point at the nasal steady-state reduces classification accuracy substantially (see Section 3.4).
We find that classification is relatively similar between positions.For example, /n/ is the worst classified phoneme in initial (81.22%) and final (82.32%) position, although both remain well classified.The velarised and palatalised phonemes are classified very similarly across both positions, suggesting a relatively high degree of distinctiveness between the acoustic correlates of all three phonemes.
The nasal ultrasound model in Table 4 is very similar to the acoustics model, with overall classification of 84.70% (initial) and 89.81% (final)./n/ is classified better in final position (94.68%)than in initial position (84.10%), but classification remains high in all cases.
Overall, nasals show better classification than laterals in acoustics and articulation.Word-final phonemes are slightly better classified than word-initial phonemes in articulation, but this is only a small difference.This stands in contrast to our previous research, where we found weak distinctions between nasal phonemes.We propose that our current model classifies nasals very effectively due to the incorporation of dynamic information across the nasal and adjacent vowel, suggesting that cues to the three-way contrast in nasals are highly dynamic.We pursue this idea further in Section 3.4.

Rhotics
The rhotic acoustics model in Table 5 shows overall classification of 91.14% (initial) and 73.19% (final).This means that rhotics show the best average classification accuracy in initial position but the worst in final position across all sonorant types in acoustics.We find very robust maintenance of initial rhotic contrasts, with /r ɣ /at 92.99%, /r/ at 90.16% and /r j / at 89.20%.In particular, /r/ is hardly ever misclassified as /r j / (0.08%), which is impressive given that these results represent the average of 100 model runs, meaning that there was near-zero confusion between /r/ and /r j /.In contrast, word-final rhotics show the poorest classification of any models, with classifications of /r ɣ / = 75.14%,/r/ = 63.28% and /r j / = 78.41%.These misclassifications are still substantially above chance classification, but it suggests that the word-final categories have less robust phonetic correlates than wordinitial categories, which leads to poorer classification accuracies.
The rhotic ultrasound model in Table 6 shows overall classification of 85.07% (initial) and 65.65% (final), showing the same patterning between initial and final context but with slightly poorer performance than in acoustics.Accordingly, every phoneme is classified slightly worse than the acoustics model in both positions, except for word-final /r/, which is near identical between the two modalities.Interestingly, the robustness of word-initial classification is evidenced in the fact that /r j / never misclassified as /r/ and /r/ is never misclassified as /r j /, suggesting a categorical distinction between these phonemes in articulatory dynamics.This suggests that the palatalisation gesture in initial rhotics is highly distinct from the articulation of the plain rhotic.In contrast, there are varying degrees of confusion between palatalised and velarised rhotics, although these categories are still fairly well classified.
Overall, the most striking result for the rhotics is that while classification is the best of all models for initial rhotics, it is the lowest for final rhotics.The acoustic data for initial rhotics also outperform the ultrasound data in classification accuracy.This suggests that there exist clear correlates of the three-way contrast for initial rhotics, especially in acoustics, but much weaker phonetic correlates for the contrast in final rhotics.

Comparison between dynamic models and sonorant steady-state
Finally, we compare the models in the above sections with models fitted to the midpoint of the sonorant steady-state,     which was defined in Nance and Kirkham (2020) as a labelled interval that captures relatively static formant values during an unambiguously lateral, nasal or rhotic phase.The steady-state model structure was the same as for the dynamic models, but as there is only one time-point, there are only 6 MFCCs for the acoustics and 6 DCTs summarising the ultrasound tongue shape, with no additional dynamic information.Table 7 shows the average classification accuracy for each model, with comparison between steady-state and dynamic models.To re-cap, these values represent the average classifications over 100 Monte Carlo cross-validation train-test iterations.
Table 7 shows that the dynamic models produce higher average classification accuracies in all cases, with the exception of the initial laterals acoustics model, where the dynamic model is 2.53% worse.However, the magnitude of the difference between steady-state and dynamic models is highly variable between sonorants.In acoustics, the impact of dynamics on classification is largest for nasals (24.81% higher in initial, 34.26% higher in final) and is higher than 10% for all models except initial laterals.In the ultrasound data, the differences are generally smaller, with negligible differences for laterals, final nasals and initial rhotics, but with substantial improvement for initial nasals (12.67%) and final rhotics (24.69%) when dynamic information is included.
Overall, this comparative analysis suggests that the contrastive correlates of phonological palatalisation take on a particularly dynamic quality for all sonorants in acoustics, except for initial laterals, and also take on a dynamic quality for initial nasals and final rhotics in the articulatory data.There are fewer dynamic cues to contrast in the ultrasound data, compared with acoustics, with many sonorants not benefitting from the addition of dynamic articulatory information beyond a single theoretically-informed time-point at the sonorant steady-state.

Summary of results
We conducted classification analyses on the three-way contrast in laterals, rhotics and nasals in Scottish Gaelic, with separate models for word position and acoustic/articulatory data.We use classification accuracy as a proxy for the relative stability of each three-way contrast.In word-initial position, we find that rhotics are best classified, followed by nasals, and then laterals.This overall pattern is observed in both the acoustic and articulatory data, with the acoustic data always showing better overall classification rates.In word-final position, nasals are classified best, followed by laterals, and then rhotics.This overall pattern is observed in both the acoustic and articulatory data, with the articulatory data showing slightly better classification for final laterals and nasals, but not for rhotics.Finally, we show that incorporating dynamic information about the entire sonorant-vowel sequence improves classification accuracy by between 12.30% and 34.26% in the acoustic data, except for initial laterals, which are slightly worse when dynamics are included.However, the articulatory data show less overall improvement, with only initial nasals and final rhotics showing improvement of over 10% when dynamics are included.In the following section, we discuss the implications of these results for the role of dynamics in contrast maintenance and the stability of palatalisation contrasts.

Variable stability of synchronic contrasts
A consistent finding in this study is that nasals have higher classification accuracy than laterals.We did not predict this based on the previous Gaelic research, but there are good reasons to believe this result, the most obvious of which is the inclusion of dynamic information in our models.Formant transitions are well known to be a strong cue to place of articulation, particularly for nasals (Malécot, 1956;Wright, 2004), which is due to the weakening of the upper formants due to nasal anti-formants in the spectrum.Indeed, Iskarous and Kavitskaya (2018) find nasals to be more distinctive than laterals in formant transitions.The inclusion of dynamic information for nasals is, therefore, a plausible reason for why we find better acoustic contrast in nasals than laterals, in contrast to Nance and Kirkham (2020), where we only analysed formants at the sonorant steady-state.This is supported by our finding that laterals are classified better than nasals in our steadystate models, but that nasal classification drastically improves when we incorporate dynamic information across the sonorantvowel interval.From this, we can conclude that the three-way nasal contrast in Gaelic is fundamentally dynamic in nature and likely more so than for laterals or rhotics, due to the relevant cues to contrast being more temporally distributed for nasals.
We predicted that rhotics would show the weakest classifications, based on previous research (Kochetov, 2005;Stoll, 2017;Iskarous & Kavitskaya, 2018).This is true word-finally, but certainly not word-initially, which is in line with our previous work on Gaelic.In Nance and Kirkham (2022) we report strong evidence of contrast in initial rhotics based on low-dimensional phonetic information, such as formant frequencies, so it is unsurprising that we also find good classification for rhotics when we take even more information into account.We do find, however, that final rhotics are classified comparably worse than any other sonorant, which supports the tendency towards contrast neutralisation in final rhotics.It is well-known that codas contain weaker acoustic cues for place of articulation than onsets (Ohala, 1990;Wright, 2004).Gaelic is unusual in having an overall VC structure, similar to Irish (Hammond et al., 2014;Ní Chiosáin, Welby, & Espesser, 2012), but, despite this, the proposal that acoustic cues are weaker in syllable-final position remains and is backed up by perceptual research.For example, Kochetov (2002) and Chiosáin et al. (2012) both find that listeners are less likely to distinguish palatalised and non-palatalised pairs in VC contexts compared with CV contexts.This factor may explain the tendency for initial rhotics to show more robust distinctions than final rhotics, but this logic does not appear to extend to laterals or nasals, which show similar classification between positions and sometimes slightly better classification in final position.
We now briefly comment on how our model compares with human listeners; in other words, can Gaelic speakers accurately perceive phonemic identity from similar acoustic information to what we analyse here?Listeners can distinguish palatalised and non-palatalised consonants with high accuracy (Kochetov, 2002;Chiosáin et al., 2012;Spinu et al., 2012), even when they do not speak a language with palatalisation contrasts.Babel and Johnson (2010) found that American English listeners performed no differently from Russian listeners at a fast-paced AX discrimination task comparing wordinitial Russian palatalised and non-palatalised consonants, although Hacking, Smith, Nissen, and Allen (2016) show that L2 English learners have greater difficulty producing the Russian contrast word-finally.Our rhotics results are in line with the above research showing better perceptual discrimination between palatalised and non-palatalised consonants in CV contexts compared with VC contexts.In summary, we consider our machine classification to be comparable to the discrimination capabilities of a human listener.

The dynamic nature of palatalisation contrasts
A major finding of this study is the extent to which the incorporation of dynamic information improves acoustic classification.This was particularly true of nasals, but, surprisingly, we find little difference between the steady-state and dynamic models for initial laterals.It could be the case that the sonorant steady-state is where the primary cues for such contrasts exist in laterals.However, we also find other insensitivities to model adjustments in the initial laterals data.For example, during sensitivity testing we found that increasing or decreasing the number of coefficients had the least effect on initial laterals.It may be that the acoustic and articulatory data used here provides an adequate representation for this context, with reasonable accuracies of 73-75%, but that the highly audible contrast we perceive for initial laterals has other acoustic and articulatory correlates that are not well captured in this study.
Despite the strong contribution of dynamics to acoustic classification, we find this to a much lesser degree with the articulatory data.This may be a consequence of dynamic non-linearity in acoustic-articulatory relations (Stevens, 1989;Strycharczuk & Scobbie, 2017;Gorman & Kirkham, 2020), whereby articulatory variation in some parts of the vocal tract does not produce proportionate change in the acoustic output, at least in terms of the parameters measured here.Another explanation could be the nature of the acoustic and articulatory representations used in this study.For instance, MFCCs capture rich details of the acoustic spectrum, whereas the midsagittal tongue shape obtained by ultrasound imaging is already a very sparse representation of the threedimensional oral tract.Furthermore, it is possible that the lesser contribution of dynamics to articulatory classification may be a consequence of our focus on global change in midsagittal tongue shape.It may be the case that other aspects of articulatory timing, such as the relative timing of coronal, palatalisation and velarisation gestures, represent stronger articulatory cues to contrast than overall change in tongue shape.We plan to explore this further in future research, with the aim of better understanding the articulatory dynamics of palatalisation contrasts.
Finally, we must highlight some caveats for interpreting the comparison between steady-state and dynamic models.First, the inputs to each model necessarily differ in dimensionality (6 for steady-state, 18 for dynamic).While this is an obvious consequence of incorporating time-varying information into the dynamic model, a larger number of parameters increases the possibility of overfitting and producing overly optimistic classification rates, so it would be valuable to further evaluate the effects of parameter space size on a much larger data set.We also cannot discount the possibility that the dynamic model is picking up on vowel cues that correspond to lexical items, rather than the phonetic correlates of deep phonological structure.In other words, by incorporating information from the sonorant and the adjacent vowel, we could be identifying mostly word-specific information.In part, this is unavoidable, as Gaelic has relatively few true minimal triplets for these contrasts, but it would be worthwhile testing on languages where such contrasts have a higher functional load, such as Russian.Finally, our analysis demonstrates the extent to which dynamic information contributes towards classification accuracy, but does not tell us the precise nature of this dynamic information.In future research, we plan to examine the temporal dynamics of the lingual gestures involved in Gaelic palatalisation contrasts.

The diachronic typology of palatalisation contrasts
We made the prediction that sonorants with a greater propensity towards diachronic phonological loss across a language family would show synchronically weaker contrasts.This was grounded in the principle that processes of diachronic change can be inferred from synchronic snapshots (Labov, 1994).In our case, the diachronic predictions suggested that laterals should have the highest classification rates and rhotics the lowest classification rates, given that lateral contrasts are best-maintained across the Goidelic language family and rhotics the least well-maintained.Our results only support the diachronic predictions when we focus solely on the sonorant steady-state, which is a partial and insufficient representation of palatalisation contrasts.When we take into account the dynamics of how the palatalisation gesture unfolds over time, we instead find a different set of results that interact strongly with word position.To re-cap, rhotics are best classified in initial position and worst in word-final position, with nasals being relatively well classified in all contexts, and laterals always being classified less accurately than nasals.The word-final rhotic synchronic data, however, do pattern with diachronic trends towards neutralisation across Goidelic.Cross-linguistically, it has been shown that large rhotic inventories are subject to simplification, with palatalised rhotics particularly susceptible to loss (Hall, 2000).We anticipate that competing biomechanical demands on palatalised rhotics can lead to partial masking of the palatalisation gesture, especially in word-final position.For instance, Stoll (2017) reports more variable gestural timing in palatalised rhotics compared with laterals, which may also lead to greater overlap between rhotic categories.Given sufficient exposure, this increased overlap is likely to cause instances of misperception and subsequent recategorisation of a listener's phonological system, leading them to produce smaller distinctions between rhotic phonemes (Ohala, 1981;Ohala, 1989).Moreover, if the reduced variants become recognised as acceptable by other community members, possibly due to the low functional load of the contrast, this is likely to accelerate the long-term progression of contrast neutralisation (Beckman et al., 1992;Bybee, 2015).
Nasals are especially interesting in this case as Goidelic diachronic data suggests they are retained more frequently than large rhotic systems, but less frequently than large lateral systems.In Slavic, on the other hand, palatalised nasals are very frequently maintained cross-linguistically, more so than laterals and rhotics (Carlton, 1990;Iskarous & Kavitskaya, 2010).Our data pattern more closely with the reported typology of Slavic sonorant development, with nasal phonemes produced more distinctively than laterals and final rhotics.This is surprising in light of previous research, some of which has suggested only a two-way contrast in Gaelic nasals (Ladefoged, Ladefoged, Turk, Hind, & Skilton, 1998;Nance & Kirkham, 2020), but it may be the case that the Gaelic contrast has been maintained by temporally distributing the phonetic cues to contrast across the sonorant-vowel interval, which has not previously been investigated as thoroughly.We are unable to claim whether this is a novel development in Gaelic, but previous research on Slavic has also shown that nasals may sometimes show more robust contrasts than laterals in formant transitions (Iskarous & Kavitskaya, 2018), so it is likely that a similar pattern recurs in our data.
In summary, we find a more complex relationship between diachronic predictions and the variable stability of synchronic contrasts than we initially predicted.We believe, however, that the sociolinguistic context of Gaelic is highly informative in understanding these results.Gaelic is a minoritised language that is currently undergoing intense revitalisation.Minority languages often experience structural simplification (Dorian, 1981;Jones, 1998), but we note that speakers of Gaelic often have high levels of metalinguistic awareness about the lan-guage's phonology (Nance, McLeod, O'Rourke, & Dunmore, 2016).All of the speakers in our study worked in Gaelicessential jobs and, therefore, represent highly professional speakers of the language.The strong investment of such speakers in maintaining Scottish Gaelic also increases the likelihood of them learning to produce traditionally-reported contrasts in the language, which are often acquired through education.This sociolinguistic context, therefore, may represent one of the contributing mechanisms for the preservation of structures that would otherwise be likely to undergo loss in more typical cases of community transmission (Nance and Kirkham (2022)).It is clear from this that identifying potential future paths of sound change in the Gaelic sonorant system will also require detailed attention to the changing sociolinguistic dynamics of Gaelic.

Conclusion
This study has examined the variable synchronic stability of palatalisation contrasts in light of claims that such contrasts are prone to diachronic simplification, reduction or loss.The cross-linguistic diachronic evidence suggested that laterals would show the most robust contrasts and rhotics the least robust contrasts.We do indeed find that rhotics are most poorly classified word-finally, which may reflect the diachronic trend towards contrast reduction, but we find the opposite pattern word-initially, where rhotic contrasts are highly robust.This demonstrates that some contrasts in Gaelic are robustly maintained despite intense pressures towards diachronic reduction.We do not find evidence to support the claim that laterals show more robust contrast than nasals, with both sonorants being well-classified, but with nasals showing better classification once dynamic information is taken into account.Accordingly, we find that synchronic speech production data bears a complex relationship with long-term patterns of diachronic change reported across the Goidelic languages, and it is likely that a fuller consideration of how phonological dynamics interact with changing sociolinguistic contexts will further illuminate the potential paths of sound change in Gaelic.Overall, we find evidence of weaker contrast in predictably unstable sonorants, but elsewhere we find that contrast is often more robust than previously anticipated, with the phonetic correlates of phonological structure located firmly in the temporal dynamics of the speech signal.

Fig. 1 .
Fig. 1.Map showing the concentration of Gaelic speakers in Scotland according to the most recently available figures from the 2011 National Census.Attribution: By SkateTier -Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=31996352.Original figure in colour, converted to greyscale here.

Fig. 2 .
Fig.2.Original midsagittal tongue shape for one token plus DCT reconstructions of the same data using varying numbers of DCT coefficients.The tongue tip is on the right of the image and the tongue root is on the left.The token represents a single spline taken from a word-initial rhotic.

Fig. 3 .
Fig. 3. Pearson's correlation between the original ultrasound tongue splines and DCT-smoothed versions using varying numbers of DCT coefficients.The solid vertical line represents the final number of DCT coefficients used for the classification analysis.

Fig. 4 .
Fig. 4. Pearson's correlation between dynamic acoustic-MFCC trajectories and DCT-smoothed versions using varying numbers of DCT coefficients (top) and the dynamic ultrasound-DCT trajectories and DCT-smoothed versions using varying numbers of DCT coefficients (bottom).The solid vertical line represents the final number of DCT coefficients used for the classification analysis.The dashed horizontal line represents the correlation coefficient cut-off used for selecting the number of DCT coefficients for each measure, which was based on the first 5 dynamic MFCC/DCT trajectories.

Table 1
SVM classification matrix for lateral acoustic data.Values represent percentage correct classification (rounded to 2 decimal places).

Table 3
SVM classification matrix for nasal acoustics data.Values represent percentage correct classification (rounded to 2 decimal places).

Table 4
SVM classification matrix for nasal ultrasound data.Values represent percentage correct classification (rounded to 2 decimal places).

Table 5
SVM classification matrix for rhotic acoustic data.Values represent percentage correct classification (rounded to 2 decimal places).

Table 6
SVM classification matrix for rhotic ultrasound data.Values represent percentage correct classification (rounded to 2 decimal places).

Table 7
SVM average classification accuracies (%) for models fitted to the sonorant steady-state (steady-state) and the whole sonorant-vowel interval (dynamic).The 'difference' column represents the dynamic model accuracy minus the steady-state model accuracy, with positive values indicating % improvement for the dynamic model over the steady-state model and negative values indicating better relative performance on the steady-state model.

Table 8
Lateral word list used in this study.

Table 9
Nasal word list used in this study.

Table 10
Rhotic word list used in this study.