Phonological and phonetic properties of nasal substitution in Sasak and Javanese

heterorganic a. [s] [Ns]


Introduction
There is increasing evidence that language sounds which appear to be 'the same sound' are in some cases actually distinct from each other, with consequences for phonological analysis. The classification of sound categories is of great importance to understanding the phonological organization in a language, and close attention to small phonetic differences has contributed to the development of both Lexical Phonology (Mohanan, 1982;Kiparsky, 1982) and Articulatory Phonology (Browman & Goldstein, 1986, 1992. The effect of sameness may arise from categorical perception (Liberman et al., 1957;Harnad, 2003); the fact remains that native speakers do not necessarily attend to all distinctions in speech sounds. For example, sometimes segments with different phonetic properties are classed together as a single sound, such as the different articulations of American English /ɹ/ (Delattre & Freeman, 1968;Mielke et al., 2010;Archangeli et al., 2011;Mielke et al., 2016) and the different acoustics of American English /s/ . Another class of relevant examples is neutralization, both incomplete and complete. Most striking are instances of incomplete neutralization, where two sounds from different sources were considered to be the same but were later revealed to be slightly different under close phonetic examination (see Port, 1996 andYu, 2007 for summaries of the issues; final devoicing examples are found in Port & O'Dell, 1985, Warner et al., 2004Winter & Röttger, 2011 among others;English [l] in Lee-Kim et al., 2013;1 In the orthography of both languages, 'ng' is used for [ŋ], 'ny' for [ɲ], 'c' for [ʨ], and 'e' for both [e/ε] and [ə]. We use IPA symbols for both languages. For more on nasal substitution, see De Guzman (1978); Archangeli et al. (1998); Pater (1999Pater ( , 2001 and the excellent summary in Blust (2004) for phonological analysis of Austronesian nasal substitution, Reid (2000) on the historical development of nasal substitution, and Wouk (1999); Austin (2010Austin ( , 2013 on the syntactic distribution of 'nasal verbs' vs. 'oral verbs' in Sasak. Here and elsewhere, Sasak items are from fieldnotes by Archangeli and Yip; Javanese items are from Robson & Wibisono (2002) and were verified by a native Javanese-speaking linguist.
it pairs with orthographic 'ny' instead of with 'n,' illustrated in Table 2 where the nasal correspondent of [s] is represented as [N s ]. 2 The contrast between the behavior of [s] and the behavior of other voiceless obstruents appears to break the 'homorganic nasal' pattern. Our question is whether this is indeed the case, arguing in favor of an abstract pattern, i.e., a relation between two sounds which is not grounded completely in their phonetic properties. The alternative is that the homorganic nasal pattern is concrete, i.e., the relation between the two sounds is fully grounded in their phonetic properties. In an abstract relation, the places of articulation of [s] and of [N s ] would be quite different from each other while if the relation is concrete, the articulation of [s] and [N s ] would be relatively similar to each other. In either case, there remains the question of what the place of articulation is for each of these sounds; possibilities are shown in Table 3.
Under the abstract hypothesis, the two sounds are related morphologically, but the homorganic relation found with other segments does not hold with [s] and its corresponding nasal [N s ]. We expect that the tongue positions of [s] and [N s ] would be quite different from each other, and comparable to the way that heterorganic tongue positions differ from each other. The place of the two sounds might correspond to the perceived place, with [s] homorganic to [t, n] (an dental/alveolar [s]; Dart, 1991) and [N s ] homorganic to [ʨ, ɲ] (Table 3a). Alternatively, [s] might be distinct from [t, n] (e.g., [s] is postalveolar in the phonological analysis of Mester, 1986) [t, n], with [ʨ, ɲ], or with a third place of articulation (Table 3d, e, f respectively). The abstract and concrete hypotheses are summarized in Table 4.  Sasak (left) and Javanese (right). Non-nasal forms begin with an oral consonant; nasal forms show the corresponding nasal-initial form. (a) Voiceless obstruents pair with single nasal consonants while (b) voiced obstruents pair with nasal-obstruent sequences. We set out to answer the "Abstract or concrete?" question and to address the related place of articulation issues based on ultrasound data of speakers of both Sasak and Javanese.

Language background
Sasak is the primary local language of Lombok, Indonesia, with speakers estimated at 2 million (Clynes, 1995) and 2.5 million (Marli, 2015). Sasak, along with Balinese, Sumbawa, Malayic, and Chamic, is within the Malayo-Polynesian sub-group Malayo-Sumbawan (Adelaar, 2005). Sasak is described as having four (Jacq, 1998) or five (Austin, 2003) major dialects. (Austin, 2003 reports that the informal names for the dialects relate to how each group pronounces the deictic words for 'like this' and 'like that': Ngenó-Ngené (central northeast, central east, and central west coasts of Lombok), Menó-Mené (central Lombok), Ngotó-Ngeté (northeastern Lombok), Ngenó-Mené, also known as Kutó-Kuté (north Lombok), and Meriaq-Meriku (south central Lombok). Jacq, 1998 does not include Ngenó-Mené as a dialect.) The dialects with the widest geographical distribution are Ngenó-Ngené and Menó-Mené, which are the only varieties used by Sasak speakers in this study.   Javanese is the most-spoken regional language of Indonesia and the most-spoken language of the Austronesian language family with approximately 75 million speakers. It is found along the northwest coast of Java (Banten, Krawang, Cirebon) and in the central and eastern areas of this island. Outside of Java, it is used in diasporic communities in neighboring provinces of Sumatra, Kalimantan, and Sulawesi, as well as in Suriname and New Caledonia. Three dialects are usually distinguished (western, central, and eastern) (Ras, 1985), and the western dialect is further divided into seven subdialects (Nothofer, 1980). Our participants are all from the central and eastern dialects, which have not yet been studied in much detail (Nothofer, 2006).
Nasal substitution in Sasak appears on verbs "used when the Patient-like argument is nonreferential" according to Austin (2013, p. 41). There are many other factors determining the distribution of oral-or nasal-initial verb forms; see Austin (2013). Similarly, in Javanese, nasal substitution appears on verbs and relates to argument structure: Sato (2008, p. 53) calls nasal substitution the 'active voice morphology' and shows that it is necessary in basic transitive clauses, but does not occur in Wh-questions or passives. See also Sato (2015); see Herawati et al. (2016) for discussion of nasal substitution and denominal verbs.
As for the sounds, nasal substitution refers to related pairs of words, typically one which begins with a voiceless oral obstruent and the other with a nasal homorganic to that obstruent. (Voiced obstruents, sonorants, and vowel-initial words have their own patterns; see Clynes, 1995 andAustin, 2013 for Sasak;Dudas, 1976;Robson, 1992;and Lee, 2001 for Javanese, and Pater, 1999and Pater, , 2001 for the Austronesian pattern in general).
To put nasal substitution in context, the Sasak and Javanese consonant inventories are shown in Table 5, based on Clynes (1995); Archangeli et al. (2016) for Sasak 3 and Dudas as 'postaveolar/palatal' due to the lack of agreement in the literature as to the precise location of constriction described for these consonantal sounds. Both languages distinguish bilabial, postalveolar, and velar consonants. Sasak has a single dental/alveolar category. Javanese has both dental and alveolar consonants (Dudas, 1976[citing Horne, 1961Hayward & Mulijono, 1991]), also described as a dental/retroflex contrast (Suharno, 1982;Robson, 1992;Adisasmito-Smith, 2004;Graff & Jaeger, 2009); 's' is classed with dentals by all sources. Javanese nasal substitution is described as resulting in a dental nasal [n̪ ] regardless of whether the corresponding sounds is a dental [t ̪] or an alveolar (or retroflex) [ṯ/ʈ]. Because of the challenges of imaging the tip of the tongue with ultrasound, we did not use stimuli with initial alveolar/retroflex stops in the Javanese study. 4 The languages are similar in that each has only one sibilant, which is one of the sounds targeted in this study. It is possible that there is more variety of articulation for the sibilant because there is no sibilant contrast to be maintained-e.g., there is no [∫] alongside the [s]. Clynes (1995) classes Sasak [s] together with [ʨ, ʥ]. Mester (1986) views Javanese /s/ as a postalveolar consonant, and Robson (1992) states that the Javanese 's' is similar to that of English, but "sometimes is heard as approaching sh" (p. 12). 5 We did not perceive this fluctuation ourselves, in either language. These different classifications, along with nasal substitution apparently relating 's' and a palatal nasal, raise questions about the phonetic nature of the sibilant in both languages: Could it be articulated somewhere between an dental/alveolar sound and a postalveolar/palatal sound, or is it indeed postalveolar/palatal? On the other hand, because Sasak has only one contrast in the dental/alveolar region while Javanese has two contrasts for the stops, we might expect concomitant differences between the two languages in the nasal counterparts to [s].

Methods
In order to carry out this study, we collected and analyzed ultrasound tongue imaging data. Sasak data were collected at the Mataram Lingua Franca Institute in Lombok, Indonesia, and Javanese data were collected at the University of Hong Kong.
The procedure for collecting and processing data in the two languages is largely the same. Differences in the methodologies arose because the two studies were carried out independently; we saw the value of putting the two together after the data were collected; analysis methods are as similar as possible given differences in number of stimuli per language and number of repetitions per stimulus for each language. 6 We present the basic methodology here, along with ways in which the procedures for the two data sets differed.

Participants
For the Sasak part of the study, there were 11 participants who all reported speaking Sasak exclusively until elementary school; all continued to use Sasak on a daily basis throughout their lives. All participants also reported fluency in Bahasa Indonesia and have learned 4 While there is a somewhat robust literature on the laryngeal contrasts in Javanese, including Brunelle (2010); G. Poedjosoedarmo (1986); G. R. Poedjosoedarmo (1993); Thurgood (2004); Matthews (2015), our focus is on the alternations with voiceless consonants. 5 Rehg and Sohl (1981) make a similar observation about 's' in Pohnpeian, another Austronesian language, where the 's' sounds somewhere between English 's' and 'sh,' with speaker variation about the degree of palatalization. 6 Archangeli and Yip carried out the Sasak data collection; Lee and Qin collected the Javanese data.
English as a third language. Ages ranged from 19 to 37 (average 24.4); 4 were female and 7 were male. Of these, 8 self-identified as speakers of the Menó-Mené (M-M) dialect while 3 self-identified as Ngenó-Ngené (Ng-Ng) speakers. Data from 2 additional speakers were omitted due to poor quality of the ultrasound images.
For Javanese, 8 female native speakers (and no male speakers) were recorded. 7 All participants reported speaking Javanese (either the eastern or central dialect) on a daily basis until moving to Hong Kong, and all also reported fluency in Bahasa Indonesia. Some learned English as a third language, while others instead learned Cantonese. Ages ranged between 23 and 39 (average 31). The experiment was conducted either in English or in Cantonese, with occasional explanation in Javanese or Bahasa Indonesian by a bilingual Javanese-and Bahasa Indonesian-speaking assistant. (Data from two additional Javanese speakers were omitted due to poor imaging quality in the ultrasound signal).
Information about each Sasak-and Javanese-speaking participant's gender, age, and native dialect appear in Table 6.

Stimuli
To examine the relationship between the place of oral and nasal sounds, we identified items with initial coronal voiceless consonants along with their morphologically-related nasal-initial forms. These initial consonants either had a known place of articulation, that is, either dental/alveolar or postalveolar, or they were ambiguous in place ([s] and [N s ]), shown in Table 7, with examples in the rightmost column.
In selecting stimuli, only morphologically-related forms were included, with either [a] or [ə] in the first syllable. The vowels [ə] and [a] were chosen in order to minimize the effects of coarticulation of the consonant from the following vowel. Non-high, central vowel contexts were chosen because high vowels with front or back tongue position (e.g., [i] or [u]) typically show stronger influences on tongue shape and position during consonantal constriction than do other types of vowels (Öhman, 1966;Zharkova & Hewlett, 2009 For Javanese, stimuli were selected in a similar way, with either [a] or [ə] present in the first syllable of each target item. For this language, 10 items were identified for each of the 6 target consonants, resulting in a total of 60 Javanese target stimuli. A full list of target Sasak and Javanese stimulus items is presented in the appendix.

Procedure
Each data collection session began with an explanation of the study and the data collection methods. The participants were seated in front of a display laptop, which was used to present visual prompts. Two posable camera arms (Manfrotto 143 Magic Arm) were adjusted to provide a stable headrest for the participants and to minimize head movement throughout the entire duration of the collection session. A third fixed the position of the ultrasound sensor along the centerline of the lower jaw at a location where the full midsagittal contour of the tongue imaged most clearly. This setup is shown in Figure 1, where a close-up of head-to-probe stabilization method is shown in the image to the right.
When stabilization arm adjustments were complete, participants were asked whether they were willing to continue with the study. On agreement, each was asked to sip water slowly through a straw, in order to create an image of the palate. Once a good palate image was obtained, participants were instructed to produce each target stimulus from the dedicated display laptop's screen, and prompts were advanced for each participant by an experimenter. Throughout the task, another experimenter monitored the imaging quality during collection to ensure that good ultrasound images were obtained.
Participants were asked to read each of the target word items, which appeared on the screen of the display laptop, one at a time. In the case of Sasak words, target items were produced in isolation, whereas for Javanese, target words were presented in the carrier phrase Kata ____ '(the) word (is)___,' in order to make speakers produce a preceding [a] vowel immediately before the target initial-consonant sounds.
Stimuli were presented in a randomized order that was unique to each participant. For Sasak speakers, randomization was performed within each of six stimulus blocks, with each block containing one iteration of each target word. This resulted in six productions of each item and a total of up to 84 token productions per session for Sasak. For Javanese speakers, randomization was performed within one large block, in which each target item appeared two times, resulting in two productions of each target item and a total of up to 120 token productions per session for Javanese. The target number of prompts and repetitions for each sound in each language is summarized in Table 8.
Because the collection procedures for Sasak and Javanese were designed independently of each other and occurred separately, the number of target items for Javanese was much larger than that for Sasak in this study. On the other hand, the number of iterations of each item in Sasak was larger than that for each item in Javanese. The consequence is that there are 50% more token productions in the Javanese data than in the Sasak data.
For all recording sessions, the ultrasound images were collected using a 2-4 MHz convex ultrasound sensor (Telemed MC4-2R20N) coupled with a Telemed ClarUs-EXT portable, ultrasonic beam-former connected to a high-performance laptop that functioned as a machine dedicated to audio-and video-data collection. The ultrasound images were constructed and displayed using Echo Wave II ultrasound imaging software (Telemed 2015) at approximately 60 frames per second. On-screen renderings of these images were collected using a combination of desktop-display software (XSplit Broadcaster: SplitmediaLabs, Figure 1: Sasak data collection set up in Lombok. The posable camera arms are cloth-covered to make them more friendly to participants. Two arms stabilize the forehead while a third holds the probe in a fixed position. One laptop is used for presenting stimuli while the other is used to monitor and collect data. The ultrasonic scan unit, which was placed underneath the desk in order to mitigate fan noise, is not shown here. 2015) and real-time on-screen video-capture software (Fraps: Beepa, 2015) at a stable rate of 60 frames per second. 8 Each session involved under 30 minutes of recording. Audio was captured with an over-the-ear condenser microphone. In order to synchronize audio and ultrasound video, for Sasak, one experimenter produced a series of 6-14 tokens of the voiceless post-alveolar click [k͡ !] at the end of each speaker's video and audio recordings immediately before stopping those recordings. The Javanese recordings lacked these click productions, and thus, in order to synchronize each ultrasound video to its corresponding audio signal, 10 instances of the release of the unaspirated voiceless velar stop [k], present at the beginning of each item's carrier phrase Kata _______, were selected at random throughout each recording and analyzed instead of clicks. Video-toaudio synchronization was achieved by determining the mean temporal lag between the onset time of the acoustic release burst of each stop (voiceless post-alveolar click [k͡ !] for Sasak; velar plosive [k] for Javanese) and the time of the ultrasound frame immediately prior to visible articulatory release. 9 In most cases, the ultrasound video signal had a consistent lag of between 1 and 3 seconds after the audio, and for each ultrasound video, frame times were subsequently readjusted in order to align them temporally with corresponding acoustic events in the audio signal. Post-collection and post-alignment, single ultrasound image frames were extracted from the video recordings as PNG-format image file sequences using digital video playback software (QuickTime Pro: Apple 2010). The recordings for 4 Javanese speakers (J2, J5, J6, and J7) were inadvertently halted midcollection and thus their productions were recorded in multiple video files. However, these speakers did not move out of position when the recordings were halted, and in these cases, video-to-audio synchronization simply required the calculation of lag for each individual video file relative to its corresponding audio signal. (See manual alignment techniques in Miller & Finch, 2011.) 8 There is a slight mismatch, of at most a 16.7-ms lag in the frame rate of the Echo Wave software's ultrasonic image construction and that of the video-capture software. This is not considered problematic for temporal synchronization: For nasals and fricatives, whose duration is much longer than 16.7 ms, the medial frame is selected so the selected frame is always within the segment; for stops, we took the frame preceding the release burst (see section 3.4) so if there is an effect of the latency, the frame selected is slightly earlier than desired but certainly during the stop-constriction interval, and the frame does not come after the release of the stop, when the tongue has moved away from occlusion. This was verified during visual inspection of each extracted frame. 9 Multiple samples of click/stop release bursts for synchronization were measured in order to improve calculations of lag between audio and video within a given recording. On average, the standard deviation between multiple lag measures in the same recording was 11.3 ± 4.3 ms across participants. For the Sasak recordings, collected in Lombok, there were environmental factors that increased the noise-to-signal level in each audio recording, such as roosters crowing at random times, mosque calls for prayer, a pre-school promotion ceremony, motor-scooters passing by, and echo-y rooms to record in. The condenser microphone, positioned close to participants' mouths and set to record at a low-level setting for gain during recording, served to improve the signal-to-noise ratio in the audio signal. These disruptions have had at most a minor impact on this study since the primary focus is the articulatory gestures associated with the sounds of interest, not their acoustic properties, and were not severe enough to prevent aural identification of items. The Javanese recordings were collected in a quiet room at the University of Hong Kong and did not have such issues.

Analysis
Analysis involved four steps: (i) identifying frames to analyze, (ii) assigning coordinates to tongue contours, (iii) quantifying the distance between contours, (iv) statistically modeling the distribution of distance values across conditions.
Frames were identified through the audio recording, using Praat (Boersma & Weenink, 2015). Details follow about how frames were identified for different sound types. a. Oral stops and affricates [t, ʨ]. For oral stops and affricates, the frame of interest was defined as the last frame before the release of the oral stop constriction, which was identified from the corresponding waveform and spectrogram. The extracted ultrasound frame was assumed to represent a full stop constriction (i.e., the achievement of a full postalveolar constriction gesture) because the temporal distance between ultrasound frames (16.67 ms) was shorter than the time it would take for the tongue to maintain a stop constriction prior to release. The frames of interest for affricates were determined in the same manner because of the oral-stop portion of such sounds contained a complete oral constriction at the location for the affricates. b. Nasal stops [n, ɲ, N s ]. For nasal stops, the extracted frame was the frame closest to the acoustic midpoint of the interval of nasal-stop constriction, as determined from the waveform and spectrogram. In Sasak, where words were collected in isolation, the nasal stop was preceded by silence and the onset of nasalization was identified as the onset of the vocal fold vibration in the acoustic waveform. For the end of the nasal, and in Javanese where the nasal stop was preceded by a vowel due to the carrier phrase, the nasal stop boundaries were identified by the loss of vowel formant structure and a significant decrease in acoustic intensity in the acoustic waveform and spectrogram. c. Fricative [s]. For fricatives, the frame of interest was identified as the frame closest to the acoustic midpoint of the frication associated with the [s] articulation. The onset and offset of frication was determined by the presence of aperiodic noise in the waveform. The frame closest to the midpoint of the frication interval was assumed to best represent the [s] articulation because the corresponding acoustic pattern in the spectrogram was most characteristic of [s] at the center of the fricative.
The next step was to convert the images into coordinates corresponding to the tongue surface as shown in the image. EdgeTrak software (Li et al., 2005) was used to determine the boundary edges (corresponding to the surface of the tongue) and fit a smoothed graphical spline curves onto the boundaries; edges were hand-corrected as needed. The data for each spline were exported as a set of 100 equidistant coordinate points.
Although continuous attempts were made to ensure that the collected ultrasonic data did not contain any head movement relative to the transducer, comparisons of 9-10 successive traces of each speaker's palate collected throughout the ultrasound recordings indicated that head movement occurred during the scanning for 5 talkers (S9, S10, J6, J7, and J10). Based on the palate data, the approximate moment of each significant shift in head position during production was identified, and palate contours affected by the movement were adjusted via spatial translation until the anterior portion of these affected palate contours were situated in the same region as those of the other palate traces. Then all sound contours of interest for these participants that were also affected by the head movement were adjusted in the same manner as the adjusted palate traces in order to correct for head movement in the spline data. This adjustment resulted in an overall reduction in variation in spatial position of the tongue splines in the data for the 5 talkers with head movement. An example of all palate and tongue contour data for participant S9 before adjustment and after adjustment is given in Figure 2.
In order to derive measures of similarity/dissimilarity between production tokens, root mean squared distances (RMSDs) between tongue contour pairs were calculated. Mean squared distance and RMSD values are used for various purposes in ultrasound tongue position research, e.g., comparing native and non-native speakers' articulations (Li et al., 2005;Davidson, 2005;Berry et al., 2012), understanding coarticulation (Irfana & Sreedevi, 2016), and evaluating the accuracy of edge-detection algorithms (Roussos et al., 2009;Fasel & Berry, 2010;Csapó & Lulich, 2015). For our purposes, low RMSD values indicate that two sound tokens were articulated with high similarity, whereas high RMSDs indicate that the tokens were articulated with quite distinct lingual contours. RMSDs were calculated from the spatial distances between the contours from each token pair along each angle with an integer value shared between the two contours with respect to the location of the origin. The origin was defined as the point of intersection between the lines representing the leftmost and rightmost boundaries of the ultrasonic image for each talker, and the origin's location depended solely on the scan settings (scan frequency, scan depth, and field of view) used in the EchoWave software. The RMSD calculation procedure is depicted in Figure 3. Distances at each angle were squared individually, then summed together, and divided by the total number of angles, and the square root of this value was taken as the RMSD measure for the token pairing.
For each spline pair, no measures were taken at angles (θ) that intersected with only a single trace, as shown at both ends of the images in Figure 3. This method contrasts with the mean Euclidean distance algorithm used in (Zharkova & Hewlett, 2009), which calculates the arithmetic mean of the shortest distances of all points along each spline to all points along the paired spline. Where there was a mismatch in spline length along the x-dimension (the θ-dimension in our method), as seen in Figure 3, the Zharkova and Hewlett approach could overestimate mean distance between splines because points at the extreme ends of each spline would be measured as having longer distances to their nearest point along the paired spline. Our method, on the other hand, essentially ignores measures at angles (θ) where both contours were not present in the spline data, and the potential overestimation of distance between spline pairs at the extreme ends is less likely here than in Zharkova and Hewlett. Since our aim is to determine the degree of similarity/difference (or degree of homorganicity/heterorganicity) between contours, we chose a method that would not necessarily overestimate mean distances between contours, i.e., not exaggerate their differences, in cases where contours simply differed in length along the θ-dimension rather than in articulatory place of constriction. RMSD data for each language were submitted to linear mixed-effects regression (LMER) models using the lmer() function in the lmerTest package (Kuznetsova et al., 2012) in R statistical software (R Core Team, 2016). Both languages' LMER models contained fixed effects of Place (homorganic, heterorganic, ambiguous) and Nasality (shared, contrastive) and random slopes and intercepts for factor levels within Subject. Word was omitted as a random effect because the number of tokens per item in the Javanese data set was small (2 iterations per word), and the addition of this factor did not improve the fit of the model for either language. Estimates of RMSD values from these LMER models allowed for comparisons of magnitude of difference between sound pairs according to Place and Nasality conditions. The number of RMSD values per speaker and per language and condition are reported in Tables 9 and 10.

Results
In order to determine the significance of the above observations, we carried out LMERs on RMSD differences for various classes of sounds: The RMSD between two tongue splines serves as a measure of similarity of articulatory tongue positiontion. We divided the relevant sound pairs into six categories, shown in Table 11. 'Homorganic' refers to sounds with the same place of articulation, dental/alveolar or postalveolar, while 'heterorganic' is a cross between these two. These comparisons give a measure for RMSD for homorganic and heterorganic sounds. The 'ambiguous' category contains the sounds tested by this study, [s] and [N s ]. 'Nasality-same' (nasality s ) means either both sounds are oral or both   sounds are nasal; 'nasality-contrastive' (nasality c ) means that the two sounds in a pair disagree for nasal/oral articulation. 10 Using the categories from Table 11, we are able to make our predictions explicit. Our first focus is on the categories homorganic and heterorganic, whether oral or nasal. We use the homorganic class to establish reasonable RMSDs for sounds made with the same place of articulation (same or similar articulatory position of the tongue). RMSDs are predicted to be small in this case because homorganic sounds are made with the same place of articulation by definition. In contrast, we expect large RMSDs when the two sounds have different places of articulation, the heterorganic condition. Putting these together, we expect that the RMSD homorganic is smaller than the RMSD heterorganic .
When nasality is added in, we expect to find that homorganic sounds with contrastive nasality still have a small RMSD, but it is slightly larger than when the two sounds are truly identical due to differences introduced by the nasal/oral contrast because of the different manners of articulation. Our four expectations are summarized in Table 12. 11 The box plot in Figure 4 shows the distribution of RMSDs for both the unambiguous sounds and the ambiguous sounds. Focusing on the unambiguous sounds, we see that the RMSDs for homorganic sounds are small, well below 5 mm, with a lower mean when sounds are identical (i.e., the nasality s case; Sasak: 2.5 mm, Javanese: 1.6 mm) than when nasality differs (nasality c ; Sasak: 3.4 mm, Javanese: 3.9 mm), exactly as expected: RMSD Homorganic-s < RMSD Homorganic-c . (This difference is significant, as seen in Table 14.) In contrast, the means for heterorganic sounds are above 5 mm for both Sasak and Javanese, whether there is a nasality contrast or not, again as expected: RMSD Homorganic-s,c << RMSD Heterorganic . RMSDs pattern as expected; we now have a measure to use in quantifying comparisons involving ambiguous sounds.
In particular, we now can understand the abstract/concrete hypotheses' predictions in terms of RMSD. Under the abstract hypothesis, [s] is heterorganic to [N s ], shown by a relatively large RMSD. Under the concrete hypothesis, [s] is homorganic with [N s ], shown by a RMSD similar to that of homorganic-contrastive pairs since [s] and [N s ] have 10 A caveat is in order: Ultrasound images capture only part of the tongue yet there are multiple other dimensions for comparing articulation that do not appear in these images (such as the larynx, lips, posterior pharyngeal wall), and the images are two dimensional while the vocal tract is three dimensional. 11 Differences between homorganic oral and nasal consonants would be consistent with results in Gibbon et al. (2007) and Shosted et al. (2012). Gibbon et al. (2007) show that [t] and [d] have more contact than [n] in normal adult speakers of English, using electropalatography, while Shosted et al. (2012) show that degree of contact also varies between different languages: [ɲ] in Peninsular Spanish involves a fair degree of occlusion in the alveopalatal region, whereas in Brazilian Portuguese the [ɲ] has a degree of closure more like an approximant, with an articulatory target that is neither occluded nor anterior in the oral cavity. Thus we do not have clear predictions about whether same or contrastive nasality affects the RMSD of heterorganic sounds.  Table 13 in terms of estimated RMSD values for different comparisons, using LMERs to make those comparisons. The Place effects show that homorganic sounds have a significantly smaller estimated RMSD than do heterorganic sounds (Sasak same nasality: 2.5 mm < 6.4 mm, p < 0.0001; Sasak contrastive nasality: 3.4 mm < 6.5 mm, p < 0.0001; Javanese same nasality: 1.6 mm < 4.3 mm, p < 0.0001; Javanese contrastive nasality: 3.9 mm < 4.5 mm, p < 0.0001), with a general difference of approximately 3 mm. This is consistent with the prediction for place differences. In contrast, according to the LMER results for both languages, estimated RMSD values for homorganic sounds with same nasality (i.e., [t]   2.2 mm, p < 0.0001; Javanese: homorganic s 1.6 mm = ambiguous s 1.6 mm, p = 0.354), consistent with the predictions pertaining to ambiguous same sounds.
Turning to the effect of nasality, we see that the RMSD estimates for homorganic comparisons with the same nasality are smaller than those for homorganic comparisons with contrastive nasality (Sasak: 2.5 mm < 3.4 mm, p < 0.0001; Javanese: 1.6 mm < 3.9 mm, p < 0.0001), although the RMSD value for homorganic contrastive pairings is still relatively  (top) and Javanese (bottom). "Est. 1" shows RMSD estimates for the lefthand side of each prediction; "Est. 2" shows RMSD estimates for the righthand side of each comparison. The box highlights the p-value that is not significant. Sounds are categorized by place and nasality as in Table 11.  small. Heterorganic same and heterorganic contrastive pairings were similar to each other in RMSD estimates but still differed significantly (Sasak: 6.4 mm < 6.5 mm, p = 0.0082; Javanese: 4.3 mm < 4.5 mm, p < 0.0001), and corresponding RMSD values in these conditions were relatively large. Importantly, RMSD estimates for ambiguous sounds with contrastive nasality (ambiguous c ) were large (Sasak: 9.5 mm; Javanese: 6.9 mm) and differed drastically from RMSD esimates for ambiguous sounds with matching nasality (Sasak: 9.5 mm > > 2.2 mm, p < 0.0001; Javanese: 6.9 mm > > 1.6 mm, p < 0.0001). We conclude that [s] and [N s ] exhibit the greatest distinctness in lingual contour shape and place of articulation in both languages, exactly as predicted under the abstract hypothesis.
The answer to the abstract/concrete question raises the issue of whether [s] is dental/alveolar sound and [N s ] is postalveolar, or whether one or the other is something else. In order to test whether [s] and [N s ] are homorganic with other dental/alveolar and postalveolar sounds respectively, we revised our earlier LMER models to target these two ambiguous sounds, including pairwise comparisons between each of the ambiguous sounds [s, N s ] and each of the unambiguous sounds [t, n, ʨ, ɲ] (a separate model was generated for each ambiguous sound, with only relevant sound comparisons included in the data set). The revised models provide RMSD estimates for each sound-sound pair as well as p-values for comparisons with corresponding unambiguous homorganic and unambiguous heterorganic categories given in Table 11, homorganic s , homorganic c , heterorganic s , and heterorganic c . RMSD estimates and p-values from the revised LMER models are reported in Table 15. The closer the RMSD estimate for a given sound pair is to the RMSD of one of the unambiguous categories, the more likely that that pair is a member of that category. If a p-value indicates that the RMSD comparison for the ambiguous sound does not differ from the RMSD for the corresponding unambiguous homorganic comparison, then the place-ambiguous sound in the pair is homorganic with the other sound in the pair.
In all comparisons shown in  Table 15) and much smaller than the estimates for corresponding heterorganic comparisons (columns 7-9, Table 15).    Table 15) are highlighted in orange, whereas homorganic pairs (column 5, Table 15) are indicated in gray and heterorganic pairs (column 8, Table 15) are shown in green. In each plot, probability density of a given RMSD is indicated by the cross-sectional width of the outer shape at that value, interquartile ranges are indicated with box plots within each shape, and median values are indicated by the horizontal line in each box plot.

Discussion
Comparison of lingual contours in ultrasound images for the relevant sounds shows that in these languages, the nasal substitution pattern relating a voiceless obstruent with its 12 SS-ANOVA plots were generated in polar-coordinate parameters r (radial distance) and θ (angle), using the same origin as that used in the RMSD analysis and the ssanova() function in the gss package (Gu, 2014) in R. Rather than creating SS-ANOVAs by token, we created them by sound. For example, all [t]s were grouped together regardless of the word each came from. Further examination of the splines and SS-ANOVAs provides a better understanding of the articulation of these consonants in the two languages. Reviewing these images shows that nearly all talkers (10 out of 11 Sasak, 8 out of 8 Javanese) articulated postalveolar sounds [ʨ] and [ɲ] as seen in Figure 6-with somewhat dissimilar articulatory positions: [ʨ] was articulated with a tongue-blade constriction posterior to dental/alveolar [t, s, n] but without full raising of the tongue body to the hard palate, whereas [ɲ], whether derived from [ʨ] or [s], was articulated similarly with the tongue blade but with a complete palatal occlusion between the tongue body and hard palate, at or slightly anterior to the constriction locations for palatal sounds [j] and [i] in the two languages. Thus, [ɲ] in Sasak and in Javanese is characterized as having a longer region of palatal constriction than [ʨ], and this articulation is consistent with that observed in electropalatalographic data for Peninsular Spanish alveolo-palatal ñ [ɲ] (Martínez Celdrán & Fernández Planas, 2007;Fernández Planas, 2009;Shosted et al., 2012).
These articulator relations align most closely with the first abstract, heterorganic hypothesis in Table 3a, and are summarized as in Table 16. Furthermore, we see that not only is the relation between [s] and 'N s ' abstract, so too is the relation between [ʨ] and [ɲ]: They are not entirely homorganic sounds.
That these articulatory patterns obtain in a pair of related languages suggests that the anomalous pairing of [s] and [ɲ] with respect to nasal substitution is stable, despite being an abstract linguistic relation. The pairing appears to have resisted pressure to move towards a concrete relation that would, over time, serve to regularize the pattern. Archangeli et al. (2012) find the same sort of stability in Bantu vowel harmony: Bantu height harmony occurs primarily in verbs, where sequences of mid vowels are preferred to mid-high sequences, with the exception of the permitted sequence [e...u]. The same pattern is found in nouns to a lesser degree-with the exception that even [e...u] occurs less often than expected. The difference is that in verbs, the [e...u] sequence arises across a morpheme boundary, and so any changes in this sequence disrupt an entire morphological paradigm while with the nouns, gradual item-by-item change is possible. To recast Archangeli et al. (2012) in terms of the current discussion, every nasal substitution form relating [s] and [ɲ] constitutes pressure against reanalyzing any single instance relating [s] and [ɲ]. Such pressure predicts that anomalous patterns that enter a morphophonological relation (through whatever means) are likely to remain in a language.
In addition to finding a lack of within-paradigm regularity in our study (the concretehomorganic hypothesis in Table 3d-f), we also find little evidence of even partial assimilation. Although [s] is clearly articulated in the dental/alveolar region with the tongue tip/blade, we do not observe articulations of its nasal counterpart ([N s ]) that are more similar to that of [s] than would be expected from a palatal nasal sound. This work uses RMSD calculations to determine whether two sounds are homorganic or heterorganic in articulatory place, giving a quantified comparison of tongue-contour data from ultrasound images. RMSDs provide individual numerical values to represent the magnitudes of spatial difference within each contour comparison, and these values can be used for further comparison. Moreover, when coupled with a mixed-effects linear regression model, RSMD values from multiple talkers can be compared within a single analysis by treating Speaker as a random effect. This kind of comparison enables us to identify patterns that hold generally across speakers of a given language. Statistical comparisons of RMSD values across test conditions of interest allow for an evaluation of similarity or difference between those conditions, adding to the similarity tools proposed in Mielke (2012).

Conclusion
To conclude, this study has shown that there is an abstract morphophonological relation in the nasal substitution paradigm in both Sasak and Javanese. Our analysis of ultrasound images supports [s/ɲ] nasal substitution as an abstract relation within what appeared to be an otherwise general and concrete homorganic pattern: We show that in both languages, [s] is articulated quite similarly to the dental/alveolar sounds and that the morphologically-related [N s ] is indistinguishable from [ɲ]. Further examination of the data shows that the voiceless affricate is alveolo-palatal [ʨ], which corresponds to palatal [ɲ] in the same paradigm, again an abstract relation. Thus, the pattern is not as homorganic as impressionistic analysis suggests. We suggest that the pattern has resisted regularization because it is a robust morphophonological relation, held in place by morphological paradigm pressure.
Finally, we introduce the RMSD/LMER method for comparing tongue contours, pooling comparisons from multiple subjects in order to better understand general patterns within each language.

Additional File
The additional file for this article can be found as follows: • Appendix. The appendix contains a complete list of the stimuli for both languages. DOI: https://doi.org/10.5334/labphon.46.s1