Skip to content
BY 4.0 license Open Access Published by De Gruyter Mouton May 27, 2022

Bengali nasal vowels: lexical representation and listener perception

  • Sandra Kotzor EMAIL logo , Allison Wetterlin , Adam Charles Roberts ORCID logo , Henning Reetz and Aditi Lahiri
From the journal Phonetica

Abstract

This paper focuses on the question of the representation of nasality as well as speakers’ awareness and perceptual use of phonetic nasalisation by examining surface nasalisation in two types of vowels in Bengali: underlying nasal vowels (CṼC) and nasalised vowels before a nasal consonant (CVN). A series of three cross-modal forced-choice experiments was used to investigate the hypothesis that only unpredictable nasalisation is stored and that this sparse representation governs how listeners interpret vowel nasality. Visual full-word targets were preceded by auditory primes consisting of CV segments of CVC words with nasal vowels ([tʃɑ̃] for [tʃɑ̃d] ‘moon’), oral vowels ([tʃɑ] for [tʃɑl] ‘unboiled rice’) or nasalised oral vowels ([tʃɑ̃(n)] for [tʃɑ̃n] ‘bath’) and reaction times and errors were measured. Some targets fully matched the prime while some matched surface or underlying representation only. Faster reaction times and fewer errors were observed after CṼC primes compared to both CVC and CVN primes. Furthermore, any surface nasality was most frequently matched to a CṼC target unless no such target was available. Both reaction times and error data indicate that nasal vowels are specified for nasality leading to faster recognition compared to underspecified oral vowels, which cannot be perfectly matched with incoming signals.

1 Introduction

1.1 Background

The phonetics and phonology of nasal vowels have been studied extensively and, while an accurate objective measurement of nasality is difficult to obtain, several theoretical approaches have been put forward as to the phonetic characteristics of nasality, the cues in the speech signal marking nasality used by listeners, and their lexical representations (see Krämer 2017: 408ff for a review). Theories of how listeners abstract information from the speech signal and recognise speech sounds and words can be said to differ in two fundamental ways. First, they differ in which information is stored for any given word in the mental lexicon. The theoretical proposals range from models that assume listeners store very rich episodic information for every word every time it is encountered (Johnson 1997; Pierrehumbert 2002) to models that are more parsimonious and store only information that is contrastive and where each word or morpheme only has one underlying representation (e.g. Norris et al. 2000; Lahiri and Reetz 2002, 2010). Second, especially in the latter group, theories differ in terms of which information is extracted from the speech signal to access the mental representations, for example articulatory events (e.g. Liberman and Mattingly 1985; Fowler 1986), phonemes (e.g. Marslen-Wilson 1987, 1989; Norris 1994; Norris and McQueen 2008), or features (e.g. Gaskell and Marslen-Wilson 1997; Lahiri and Reetz 2002, 2010). One of the focal debates amongst the latter models, and one of the questions investigated in this paper, centres on the issue of the specification of nasality in the lexicon and whether the feature [nasal] is privative or equipollent.

Chomsky and Halle (1968: 316) treat [nasal] as a binary feature, proposing nasal [+nasal] and nonnasal [−nasal]. More complex representations have been proposed in recent years where only marked, prominent or active features are represented (e.g. Clements 2001), suggesting that languages differ in their representation. For instance, Cohn (1993) suggested that, since French contrasts nasal and oral vowels, these ought to be represented as [+nasal] and [−nasal] respectively, while this is irrelevant for English vowels. Clements, however, would distinguish between lexical representation and phonologically active features, such that a lexical representation may lack a specification to begin with, but a feature may become active later in the phonology. Similarly, Trigo (1993: 393) proposes, in an investigation of Lusitanian Portuguese, that the feature nasal is equipollent ([+nasal] vs. [−nasal]) in languages where it ‘crucially distinguishes among two or more elements’ (e.g. in Bengali) while it is privative ([nasal] vs. no value) in languages where this is not the case (e.g. in English).[1] In addition, in equipollent systems, there have also been conflicting proposals regarding the representation of contextually nasalised vowels, with some suggesting that those vowels are represented as [+nasal] (cf. Ohala and Ohala 1995 for English CVN words) or [−nasal] (cf. Chomsky and Halle 1968).

From a theoretical standpoint, an equipollent or binary system which refers to nasal sounds as [+nasal] and consequently assumes that oral vowels and consonants are specified as [−nasal] would presuppose that all items which are specified for either [+nasal] or [−nasal] form a coherent class with certain properties common to them all. It would also follow that both features could be active in phonological processes such as spreading. The assumptions for a privative or monovalent system are rather different: for instance, the opposite of the feature [nasal], inherent to all oral vowels and consonants, does not form a coherent set since it is not specified (sometimes indicated by empty brackets [ ]; Lahiri 2018). There is no one feature which groups together all oral vowels and oral consonants and thus there is no featural information which is extracted from the signal to match with a mental representation of oral sounds or to be active in phonological processes. One example of a theoretical lexical access account which proposes a monovalent system and operates on a feature level rather than using phonemes (unlike, for example the Cohort model; cf. Marslen-Wilson 1987, 1989) is the Featurally Underspecified Lexicon (FUL; Lahiri and Reetz 2002, 2010).

In addition to the question regarding the representation of nasality, this paper also addresses whether listeners distinguish between underlying and contextual nasality, and if so, which cues they use to discriminate between the two. The present study is not primarily concerned with the question of the extent to which the contextually nasalised vowel in a CVN word (e.g. English ban) differs phonetically from an underlying nasal vowel, but with the question of what native listeners of a language with both underlyingly nasal and contextually nasalised vowels do when they perceive any nasality in a vowel. If one assumes that there is a phonetic difference between contextual and underlying nasals (e.g. Cohn 1993; Beddor et al. 2013), hypotheses concerning the interpretation of nasality from the signal would differ for the two types of vowels. If phonetic cues are unambiguous as to the source of the nasality, then listeners would surely make use of these cues and thus would be able to accurately identify the source of the nasality in the signal (i.e., an underlying nasal vs. a contextually nasalised segment). If, on the other hand, representation governs perception and recognition, any nasality in the signal would in the first instance lead listeners to an underlying, fully represented nasal vowel.

There has been strong evidence that listeners are affected by top-down effects when processing the acoustic signal. Both likelihood and lexical status of a perceived signal (e.g. Ganong 1980) as well as the phonological system of a listener’s native language(s) have been found to guide perception and can lead to listeners seemingly disregarding certain acoustic cues (Jongman et al. 1992). Ganong’s seminal paper (1980) showed that categorisation of stimuli with ambiguous VOT could depend on the existence of a word. Thus, the same VOT duration would be categorised as [t] if the decision was between tack/*dack but as [d] if the choice was between *tesk/desk. Jongman et al.’s experiment also revealed that identical vowel duration led to significant variation in choosing between words and nonwords based on their phonological representation. They appended different word-initial consonants to a continuum from [at] to [aːt] to create two pairs of real words in Dutch: /zat/ ∼ /zaːd/ ‘drunk’ ∼ ‘seed’ and /stad/ ∼ /staːt/ ‘city’ ∼ ‘state’. Since Dutch voiced stops are devoiced word finally, the surface pairs would be [zat] ∼ [zaːt] and [stat] ∼ [staːt]. The results showed that the vowel categorisation between long and short depended on the underlying phonological representation of the voicing of the final consonants:

The phoneme boundary for /zat/ ∼ /zaːd/ occurred at a significantly shorter vowel length than /stad/ ∼ /staːt/. Thus, the results showed that the categorisation of ambiguous vowel length was affected by the underlying voicing of the following word final consonant.

The fact that phonological representations guide perception has been demonstrated in studies across different languages as well as dialects (i.e. Pallier et al. 1997; Pallier et al. 2001; Scharinger and Lahiri 2010), which show that phonetically identical contrasts are interpreted differently by speakers of different languages/dialects due to differences in the phonological features that are represented in their lexicon. Scharinger and Lahiri (2010), for example, show how the different specification of short front vowels for [low] in two dialects of English (American and New Zealand) affected listeners’ processing in a semantic priming study.

In the present study we are not, however, dealing with variability across dialects or languages but with the inherently variable phonetic property of nasality. Thus, our central question is whether listeners ascribe surface nasality to regressive assimilation from a following nasal consonant or whether an interpretation as an underlying nasal vowel would be equally, or indeed more, likely in languages such as Bengali, which have both contextually nasalised and underlyingly nasal vowels. Thus, in addition to the question of how nasality is represented, we are also concerned with the question whether listeners’ choices are indeed governed by this representational information. While there has been some previous work in this area (cf. Section 1.3 below), we are using different methods to elicit quicker decisions from listeners without any context in order to shed light on automatic access to stored representations.

1.2 Acoustic properties of vowel nasality

1.2.1 Measures of nasality

There is a substantial literature on the production, perception, and acoustics of nasal and contextually nasalised vowels and the phonetic differences which may present listeners with cues to distinguish between oral and nasal vowels. Although measuring acoustic cues of nasality is not the central focus of our study, we briefly provide an overview of the relevant acoustic properties that have been suggested in the literature to gauge nasality. The complex structure of the nasal tract (Bjuggren and Fant 1964; Dang et al. 1994; Pruthi et al. 2007) with asymmetric passages and additional side-cavities leads to complex acoustic effects (Dang and Honda 1996; Lindqvist-Gauffin and Sundberg 1976). The main effects for nasality found in the speech signal are flattening and widening of the oral first formant F1 (Fant 1960; House and Stevens 1956; Stevens 1998: 193), additional nasal poles and zeros which are above F1 for high vowels and below F1 for low vowels (Hattori et al. 1958; Hawkins and Stevens 1985; Maeda 1982, 1983; see also Carignan 2018, for an extended summary) and increased low-frequency energy (House and Stevens 1956; but see Vampola et al. 2020) alongside other effects (see e.g. Styler 2017). Although the nasal tract is essentially a rigid structure, even the location of the poles and zeros of the nasal tract, as manifested in the acoustic signal, depends on the amount of opening of the velopharyngeal port (Pruthi 2007: 76) (in contradiction to Stevens’ description (1998: 306)). Usual measures of nasality are the bandwidth of F1 (B1), amplitudes of the nasal poles (P0, P1) and the relation of these amplitudes to the amplitude of the first formant (A1) (Chen 1996, 1997) or relations of these amplitudes to those of the first two harmonics (H0, H1) (Huffman 1990). These measures are generally taken from the amplitude of harmonics closest to F1 (A1) and the frequencies of the nasal poles (FP0, FP1), respectively. Since the formant frequencies of vowels are close to those of the nasal poles, Chen (1996: 129) proposed an adjustment formula for the related amplitudes to account for differences in the effect of vowel types on nasal peak amplitude. Nevertheless, the closeness of F0, F1, FP0 and FP1 (and F2 for back vowels) can make it impossible to determine formant frequencies independently, especially for high-pitched voices where the wide spacing of the harmonics does not provide the granularity to separate these frequencies (see e.g. Chen 1996: 130; Berger 2007: 16). Styler (2017) proposed a procedure for measuring several parameters automatically, rather than hand-locating the respective resonances manually.

The efficiency of the above measures, however, varies considerably across speakers. For example, Berger (2007: 50) found overall best discrimination results between nasal and oral vowels for the difference A1–H1, followed by A1–P0 and B1 for Bengali, English and Spanish. Note that Bengali vowels were underlyingly nasal while English and Spanish had nasalised vowels in a nasal context. But A1–P0 was the second worst out of the six parameters he investigated for the female Bengali speaker in his study. Pruthi (2007) investigated 37 different measures on three databases (TIMIT,[2] StoryDB,[3] ICSI[4]) and found that overall in an ANOVA comparing oral and nasalised vowels, B1 yielded the highest F-ratios, followed by the A1–H1 measure and one A1–P0 measure.[5] , [6] It is worth noting that in all such investigations the widening of the bandwidth of F1 (B1) was one of the best scoring parameters related to nasality, a fact already observed by House and Stevens (1956). According to the Source-Filter Theory (Fant 1960), we can assume that the quality of the vocal tract filter (Quality or Q-factor: formant centre frequency divided by its bandwidth) is reasonably constant across a wide frequency range, as long as the velum is raised. Lowering the velum will change this quality by introducing the resonating and shunting cavities of the nasal tract, which leads to a decrease in the Q-factor. We therefore suggest Q1 (F1/B1) as our preferred measure of nasality, which can be compared across vowels independent of vowel-specific variation of F1. The lower the Q1 measure of the vowel, the more nasality is present.

1.2.2 Phonetic differences between contextual and underlying nasals

Many studies have tackled the question of phonetic differences between contextually nasalised and underlying nasal vowels. Krämer (2017) reviews the literature on the phonetics of English nasalised vowels and the comparisons with languages with both underlying nasal vowels (e.g. French (Chen 1997)) and contextualised nasal vowels (e.g. Spanish (Solé 1992, 1995) or Thai (Beddor 2007)) and concludes that most production studies do not find less robust nasalisation for English contextually nasal vowels compared to underlying nasal vowels in other languages (Cohn (1993) is the notable exception here). However, much of this literature deals with English and a comparison of the contextually nasalised vowels in English to underlying nasals in other languages, rather than a comparison of how listeners in a language which has both contextually nasal and underlying nasal vowels interpret nasality (for a review of both production and perception studies on English nasalised vowels see Krämer (2017)).

Although it is difficult to find consistent spectral properties that can be attributed to nasality in all vowels (vowel-independent properties (Beddor 1993: 173)), we know that listeners perceive and use nasality in vowels in languages both with and without an oral-nasal vowel contrast (Butcher 1976; Wright 1986) to assign vowels to two distinct categories (Beddor and Strange 1982). Furthermore, for more than eight decades it has been consistently shown that listeners use vowel nasalisation as a cue for an upcoming nasal consonant (cf. Malécot 1960). However, it has also been demonstrated that, phonetic cues aside, a listener’s phonology plays an important role in how nasality is used. Lahiri and Marslen-Wilson’s (1991) study of Bengali and English listeners showed that nasalisation in vowels is only used as a cue for an upcoming nasal consonant in speakers of a language where nasality is predictable (English) and not for speakers where vowel nasality is contrastive (Bengali).

1.3 Lexical representation of nasality

Lexical access models predominantly focus on access routes to the stored information and often do not provide much detail about the representation of features, such as nasality, or phonemes but only present a brief overview of the phonological information contained in the mental representations (see McQueen 2005: 268 for an overview of proposed representations in common speech perception models). Models such as TRACE (Elman and McClelland 1986; McClelland and Elman 1986) or Cohort (Marslen-Wilson and Welsh 1978; Marslen-Wilson 1993) and the Distributed Cohort Model (DCM, Gaskell and Marlsen-Wilson 1997) as well as Stevens’ (2002) Acoustic Landmarks Model assume a featural representation either at the pre-lexical stage or as underlying representations. An account which assumes featural underspecification would be compatible with models proposing featural representations, while models such as Shortlist A and B (Norris 1994; Norris and McQueen 2008) and exemplar-based models (Johnson 1997; Pierrehumbert 2002), which propose phonemic representations, would result in different processing predictions. The present study is primarily concerned with how contrastive features are stored and we provide predictions based on underspecified representations, phonemic representations, and exemplar-based representations in detail below.

As far as the question of the representation of vowel nasality is concerned, earlier research on Bengali shows support for [nasal] as a privative feature, claiming that predictable nasality is not represented (Lahiri and Marslen-Wilson 1991, 1992; henceforth L&M-W). Contextually nasalised vowels in CVN sequences would thus be represented without nasality, e.g. [bɑ̃n] ‘flood’ from underlying /bɑn/. This results in ambiguous surface forms in languages like Bengali where there are also underlying nasal vowels. These cases can only be disambiguated by the following consonant. In the case of an oral vowel in the sensory input (e.g. [bɑd̪] ‘omit’), an underlyingly nasal vowel would no longer be a candidate since this combination results in a no-mismatch between the sensory input (underspecified) and the representation ([nasal]). Nasal or contextually nasalised vowels in cases like Bengali (e.g. [bɑ̃d̪ʰ] ‘dam’, [bɑ̃n] ‘flood’), however, should not mismatch with the underlyingly oral vowel since oral vowels have no featural specification for nasality.

L&M-W (1991) used a gating task (‘t Hart and Cohen 1964; Grosjean 1980) with incremental presentation of the CV segment of CVC words where listeners were asked to respond with a full word which forms a continuation of the phonetic CV sequence they were perceiving. Their stimuli were sets of triplets in Bengali (CVC [bɑd], CVN [bɑ̃n], and CṼC [bɑ̃d̪ʰ]) as well as doublets where no word with a nasal vowel exists in the same consonantal environment (CVC [lobʰ] ‘greed’ and CVN [lõm] ‘body hair’).

L&M-W found that in the triplet condition both stimuli with contextualised nasal vowels and underlying nasal vowels, i.e. CVN and CṼC stimuli, respectively elicited a large percentage of CṼC responses at the offset of the vowel.[7] This shows that listeners take nasalisation as a cue that they have heard a CṼC rather than a CVN word. CVN responses for both categories were very low (below 8%) but this changes rapidly as soon as consonantal information becomes available since this disambiguates the surface nasality. The best match for the nasality in the auditory signal was a word with a vowel underlyingly specified for nasality, i.e. a CṼC word. CVN words are still an option since there is no mismatch between the signal and the underspecified underlying representation of the oral vowel. This data shows support for underspecification since a surface hypothesis would have predicted an equal distribution of CṼC and CVN responses here as both vowels would have been specified for nasality.

CVC stimuli resulted in a large proportion of CVC responses (>80%) even at very early gates.[8] CṼC responses were almost non-existent (0.7%) but there are a relatively large number of CVN responses (13.4%) considering the lack of nasality in the signal (L&M-W 1991). This phenomenon is difficult to explain with a surface representation account since, with nasality present in both representations, there should be no difference between CVN and CṼC responses to CVC stimuli. This pattern is more easily explained by proposing an underlyingly oral vowel in CVN words. CVN responses to CVC stimuli are thus not only permitted but, based on the underlying representation alone, should be as likely as CVC responses since they have the same representation (L&M-W 1991). The fact that there are more CVC responses than CVN responses corresponds to the proportion of CVC and CVN patterns in the language, with CVC sequences being considerably more frequent than CVNs.

Data from L&M-W’s doublet sets exhibited similar patterns for the CVC stimuli, and the CVN stimuli result in a different pattern where listeners produced 65% CVC responses, 16% CVN responses and 17% CṼC responses even though no CṼC item, in this consonantal frame, is available in the lexicon of the language. In these cases, participants often responded with CṼC words from Hindi, another language they were familiar with. The increase in CVC and CVN responses compared to the triplet set shows that the information in the signal matches both representations and there is no other competitor containing nasality (unlike in the triplet case). This makes the distribution of responses more similar to the CVC stimuli responses above, which are in part determined by pattern frequency in the language.

In a subsequent study using Hindi and English, Ohala and Ohala (1995) attempted to replicate L&M-W’s findings with methodological modifications since they had several criticisms of the original study, including the fact that in L&M-W’s study no statistical analysis was possible since participants’ responses were not constrained, and that previously heard versions at earlier gates could have influenced participants’ responses. Ohala & Ohala restricted the participants’ responses and, in order to prevent previous stimuli affecting subsequent responses, presented only one gated version of each stimulus per participant (with the exception of the most severely gated stimulus, which was followed by the full word after four trials). While they found results similar to those of L&M-W (1991) in their Hindi triplet condition as well as with CVN stimuli in the Hindi doublet condition, thereby lending support to the underspecification hypothesis, they do not see either gating study as providing convincing evidence for underspecification of nasality but state that ‘at best [these findings] are compatible with both the UR [underlying representation] and SR [surface representation] hypotheses’ (Ohala and Ohala 1995: 57).

Both studies (Lahiri and Marslen-Wilson 1991, 1992; Ohala and Ohala 1995) thus showed that in a gating paradigm listeners displayed an overwhelming preference for contrastively nasal vowels in response to any nasality in the signal (CVN or CṼC), to the extent that, in cases where there was a lexical gap (e.g. in Lahiri & Marslen-Wilson’s doublet conditions), listeners provided words with nasal vowels in response to CVN fragments from another language they were familiar with (Hindi).

1.4 Present study and predictions

To address the central questions of this study – whether nasality is represented as an equipollent or a privative feature, and how listeners use phonetic and phonological nasality in processing – we conducted a series of experiments using a cross-modal (audio-visual) form priming task with a forced-choice response paradigm (Cooper et al. 2002). All stimuli (described in greater detail in Section 2.1) were monosyllabic Bengali words belonging to one of three types: CṼC, CVN and CVC. The CV/Ṽ segments of these words were used as primes, with the full words as targets.

Before setting out the predictions resulting from different types of representational hypotheses, we briefly discuss the distributional patterns of nasal and non-nasal vowels in Bengali.

The Bengali vowel system consists of seven oral vowels phonemes and seven nasal counterparts (cf. Chatterji 1926; Ferguson and Chowdhury 1960). All oral vowels can also become nasalised before the three nasal consonants /m, n, ŋ/. Thus, oral vowels are, evidently, the most frequent as they can (theoretically) occur with any of the language’s 30 oral consonants. While nasal vowels can occur preceding all consonants apart from the nasals, they are most frequent before coronal consonants. As contextually nasalised vowels can only occur in certain environments (preceding one of the three nasals in the language), these are the most restricted and therefore least likely. However, these frequency assessments, based on the combinatorial possibilities afforded by the phonological inventory of the language, only tell part of the story, since it also matters how frequent the words are in which these patterns occur. Many of the words with the CVN pattern are highly frequent even though the combinatorial possibilities are the most severely restricted.

Depending on the representational specifications (i.e. privative vs. equipollent) as well as the precise specification of vowels in CVN contexts and the surface cues in the acoustic signal, several hypotheses are possible. We outline below one set of surface (phonetic) predictions (cf. Figure 1) based on the acoustic cues (assuming no difference between contextually nasalised and underlyingly nasal vowels) and one based on a monovalent system where CVN is unspecified for nasality (cf. Figure 2). In a surface account, where listeners are able to directly match primes and targets based on purely phonetic cues, we would expect faster latencies only for identity pairs since this approach would result in only one possible match per prime (cf. Figure 1).

Figure 1: 
Surface phonetic predictions for CṼ(C), CV(N) and CV(C) primes.
Figure 1:

Surface phonetic predictions for CṼ(C), CV(N) and CV(C) primes.

Figure 2: 
Phonological predictions for match and no-mismatch relationships for CṼ(C), CV(N) and CV(C) primes in an underspecification account.
Figure 2:

Phonological predictions for match and no-mismatch relationships for CṼ(C), CV(N) and CV(C) primes in an underspecification account.

In a monovalent system, however, where nasal vowels (CṼC) are specified as [nasal] while oral vowels (CVC) and vowels nasalised due to a following nasal (CVN) are not specified for nasality, shorter response latencies as well as lower error rates should be observed after CṼC primes (compared to CVC and CVN primes) since their representation better matches the surface form. In Figure 2, we lay out in greater detail the predictions made by an underspecification account assuming a monovalent feature [nasal]. As far as this approach is concerned, the only genuine match is that between a surface nasal vowel (from either CṼC or CVN) and an underlyingly specified nasal vowel /CṼC/ (cf. Lahiri and Reetz, 2002, 2010). All other conditions should result in a no-mismatch scenario between the input and the underlying representation since no nasal or oral feature is specified in either CVC or CVN words.

This results in predictions of only two real match conditions across the possible prime-target combinations. We expect an identity match between the underlying nasal vowel prime (CṼC) and its identity target and another match between a surface nasal vowel from a CVN prime and the specified nasal vowel target (CṼC). Faster response latencies and lower error rates are predicted for match conditions than for no-mismatch conditions (Lahiri and Reetz 2002). In terms of the error data, however, it is unclear whether the distinction between match and no-mismatch will surface clearly. Surface nasalisation may well play a more crucial role in the error data than in the reaction times, which are a more gradient measure. Since nasality is highly salient, this may lead to more accurate matching in cases where the prime contains a nasal or contextually nasalised vowel and a word with either option (CṼC or CVN) is present as a target.

In approaches which are fundamentally guided by frequency and experience (such as an exemplar model), it is never precisely articulated how one particular exemplar is chosen out of a number of possible options. Frequency will contribute to this selection as will, presumably, surface cues, such as the degree of nasalisation of a vowel. Since the degree of nasalisation can vary greatly in both underlying nasals and contextual nasals and exact matches are not required in an exemplar model, there must be a gradient approach and nasalisation in the signal could lead to either an underlying nasal or a contextually nasalised vowel, depending on other contributing factors such as, for instance, frequency. It is thus difficult to provide clear-cut predictions for such an exemplar model: these predictions would fall somewhere in between those made by a fine-grained phonetic account and an underspecification approach, but they would be clearly distinct from both. Phonetic accounts would only lead to an identity match, and an underspecification account makes the clear prediction that, given any nasality in the signal, the first choice would be an underlying nasal. as this is the only option specified for nasality.

In terms of frequency, two different aspects need to be considered. Firstly, if nasal vowels are heard more frequently in a CṼC rather than a CVN frame (as there are only three nasal consonants in Bengali which would trigger regressive nasalisation), listeners should be faster to match nasal vowel primes (be it from a CṼC or CVN prime) to CṼC targets. However, since the oral vowel only occurs in CVC contexts, matching these to CVC targets should be fastest overall since this is the most frequent pattern. If the number of neighbours is a determining factor here, a faster response might be expected for CṼC prime-target combinations as these have fewer neighbours. CVN words have even fewer neighbours and should thus result in even faster reaction times than CṼC words, with CVC words being reacted to slowest.

There is no published corpus for Bengali to allow us to compute neighbourhood density and vowel frequency statistics. Basic lexical frequency counts were taken from a Bengali corpus of three million words (Dash and Chaudhuri 2001) and mean frequencies for each condition can be found in Table 1. However, these can be misleading as the lexical distribution is not the same. The number of words with nasalised vowels in the context of nasal consonants /m n ŋ/ is, by definition, smaller than that of words with oral vowels which can be followed by any of the 30 oral consonants. However, in contrast to (for instance) French, and other languages where nasality has been studied, Bengali is, to our knowledge, one of the few languages which has a full set of contrastive nasal and oral vowels (seven each) (Ferguson and Chowdhury 1960: 32; Hajek 2013; Klaiman and Lahiri 2018: 423). Consequently, Bengali provides the best possible foundation for an investigation of the phonological representation of nasality.

Table 1:

Prime-target examples of doublet and triplet sets by block (with neither cases indicated in bold print) including frequency counts taken from a corpus of three million words (Dash and Chaudhuri 2001).

Triplets
Prime CṼC

[tʃɑ̃(d)]
CVN

[tʃɑ̃(n)]
CVC

[tʃɑ(l)]
Target

(Block 1)
tʃɑ̃n - tʃɑ̃d tʃɑ̃n - tʃɑ̃d tʃɑ̃n - tʃɑ̃d
tʃɑl - tʃɑ̃n tʃɑl - tʃɑ̃n tʃɑl - tʃɑ̃n
tʃɑ̃d - tʃɑl tʃɑ̃d - tʃɑl tʃɑ̃d - tʃɑl
Target

(Block 2)
tʃɑ̃d - tʃɑ̃n tʃɑ̃d - tʃɑ̃n tʃɑ̃d - tʃɑ̃n
tʃɑ̃n - tʃɑl tʃɑ̃n - tʃɑl tʃɑ̃n - tʃɑl
tʃɑl - tʃɑ̃d tʃɑl - tʃɑ̃d tʃɑl - tʃɑ̃d
Mean freq 235.08 108.23 246.23
Doublets (NoCṼC) Doublets (NoCVN)
Prime CVN

[t̪ĩ(n)]
CVC

[t̪i(l)]
CṼC

[dʒhɑ̃(p)]
CVC

[dʒhɑ(l)]
Target (1) t̪̪ĩn - t̪il t̪il - t̪̪ĩn hɑl - dʒhɑ̃p hɑ̃p - dʒhɑl
Target (2) t̪il - t̪̪ĩn t̪̪ĩn - t̪il hɑ̃p - dʒhɑl hɑl - dʒhɑ̃p
Mean freq 506.17 193.07 33.21 300.21

2 Materials and design

To test the two sets of predictions laid out in Figures 1 and 2, we investigated the processing of nasality on vowels (phonemic as well as underlyingly nasal) with one priming and two word-recognition tasks. The material was the same in all tasks and is described here for the priming task first.

We compared the processing of the vowels with different competitor sets in a cross-modal (audio-visual) form priming task with a forced-choice response paradigm where participants had to choose one of two forms presented in Bengali script. We chose three sets of stimuli: two doublet sets where either the CVN or CṼC pattern is not attested in the language and one triplet set where all three options exist. Thus, in the two doublet sets, there are either no CṼC competitors for CVN words or no CVN competitors for CṼC words. In the case where all three patterns exist in the language (i.e. the triplet set), there are either one matching target for the prime and one other competitor (a match condition) or no matching item in the target pairs (a neither condition).

2.1 Stimuli

2.1.1 Primes

CV fragments of 42 common monosyllabic CVC words of Bengali were used as primes and they were divided into three sets of near-minimal pairs: one set of 14 triplets and two sets of 14 doublets each (for examples see Table 1). The triplet set contained CVC strings in which all possible variations of vocalic nasality led to real words in the language: oral vowels (e.g. [tʃɑl] চাল ‘unboiled rice’), nasal vowels (e.g. [tʃɑ̃d̪] চাঁদ ‘moon’) and nasalised oral vowels due to a following nasal (e.g. [tʃɑ̃n] চান ‘bath’). In the first doublet set (NoCṼC), words with the same consonant sequence containing a nasal vowel do not exist in Bengali and thus the two primes in this set contained oral vowels (e.g. [t̪il] ‘sesame seeds’) and nasalised oral vowels due to a following nasal (e.g. [t̪̪̪ĩn] ‘three’). The second doublet set (NoCVN) contained only oral vowels (e.g. [dʒhɑl] ঝাল ‘spicy hot’) and underlying nasal vowels (e.g. [dʒhɑ̃p] ঝাঁপ ‘jump’), and in these cases the corresponding minimally different CV pairs with nasalised oral vowels are not attested in Bengali.

All primes were recorded in full and then truncated using PRAAT (Boersma and Weenink 2011) to include the complete duration of the vowel while ensuring no consonantal information is available. This was done based on the spectral properties; obstruents were cut at the beginning of the closure and for sonorants, decisions were based on close examination of the spectrogram as well as our own perception (cf. Figure 3 for an example of full words and truncated primes). There were no significant length differences of the individual vowels between the three conditions (oral, nasalised, underlyingly nasal).

Figure 3: 
Example of full CṼC word /ʃõk/ (left panel) and CVN word /ʃon/ (right panel). The lines mark the ends of the prime fragments /ʃõ/ and /ʃo/ respectively.
Figure 3:

Example of full CṼC word /ʃõk/ (left panel) and CVN word /ʃon/ (right panel). The lines mark the ends of the prime fragments /ʃõ/ and /ʃo/ respectively.

2.1.2 Targets

Targets were the complete words matched to the fragment primes. In the doublet conditions both possible targets were presented, while in the triplet condition only two of the possible three targets were presented (cf. Table 1). Thus, in one third of the cases neither of the two targets matched the prime, resulting in two conditions for the triplet set: one where an identity match was possible (match) and one where neither target was an identity match (neither). Targets were presented in pairs in two blocks with words in alternate positions (see Section 2.2 below).

2.1.3 Stimulus recording

All primes were recorded by a female native speaker of Bengali in a sound-attenuated room with a Roland R-26 WAV recorder at a sampling rate of 44.1 kHz using a high-quality microphone (Shure SM27). The words were extracted and digitised, and the volume equalised using PRAAT (Boersma and Weenink 2011).

2.1.4 Acoustic measurements of nasality

As discussed in Section 1.2.1, we believe that Q1 is the most reliable vowel- and speaker-independent acoustic correlate of nasalisation. However, since other measures have been preferred in the literature, we therefore report the A1–P0 compensated and A1–P1 compensated measures provided by Styler (2017) as well as the Q1 measure. We computed these values for each vowel at five points throughout the vowel (at 20, 40, 60, 80 and 100% of the vowel duration) using a modified PRAAT script (Boersma and Weenink 2011) from Styler (2017).[9]

The measure A1–P0 compensated grouped the nasal and oral vowels together as oral, while the nasalised vowels were calculated as being the ‘most nasal’, and A1–P1 compensated could not differentiate between the three different vowel types at all. Table 2 shows the results of these three measures at five positions in the vowel for all three different stimulus types (CVN, CṼC and CVC) as well as the results of t-tests to compare all possible combinations within a measure. For the same stimuli, the Q1 measure groups the nasalised and the underlying nasal vowels together, differentiating them from the oral vowels (Figure 4).

Table 2:

Average Q1, A1P0 compensated, and A1P1 compensated measures at five timing points for all vowels in CṼC, CVN, and CVC stimuli and results of t-tests (Tukey HSD) comparing those measures (* indicates significance).

20% 40% 60% 80% 100%
CṼC Q1 3.42 2.86 2.59 2.71 3.07
CVN Q1 3.02 2.51 1.76 1.90 2.66
CVC Q1 5.98 4.30 4.13 4.47 4.52
CVN/CṼC t p = 0.9031 p = 0.8803 p = 0.3798 p = 0.3276 p = 0.8207
CVN/CVC t p = 0.0018* p = 0.0251* p = 0.0002* p < 0.0001* p = 0.0090*
CṼC/CVC t p = 0.0079* p = 0.0090* p = 0.0021* p = 0.0029* p = 0.0521
CṼC a1p0 comp 0.34 1.24 0.26 0.10 −1.21
CVN a1p0 comp −0.15 −0.76 −1.34 −3.51 −2.97
CVC a1p0 comp 4.13 1.40 0.61 0.72 0.58
CVN/CṼC t p = 0.9587 p = 0.1917 p = 0.3098 p = 0.0219* p = 0.3987
CVN/CVC t p = 0.0251* p = 0.1001 p = 0.1282 p = 0.0022* p = 0.0140*
CṼC/CVC t p = 0.0538 p = 0.9863 p = 0.9355 p = 0.8685 p = 0.3235
CṼC a1p1 comp 15.85 17.85 18.49 18.82 17.34
CVN a1p1 comp 15.80 17.78 20.45 19.11 23.63
CVC a1p1 comp 19.37 17.29 18.60 19.31 21.78
CVN/CṼC t p = 0.9999 p = 0.9998 p = 0.8853 p = 0.9972 p = 0.1713
CVN/CVC t p = 0.5085 p = 0.9861 p = 0.8779 p = 0.9984 p = 0.8278
CṼC/CVC t p = 0.5176 p = 0.9817 p = 0.9995 p = 0.9904 p = 0.3460
Figure 4: 
Average Q1, A1P0 compensated, and A1P1 compensated measures at five timing points for all vowels in CṼC (nasal), CVN (nasalised), and CVC (oral) stimuli.
Figure 4:

Average Q1, A1P0 compensated, and A1P1 compensated measures at five timing points for all vowels in CṼC (nasal), CVN (nasalised), and CVC (oral) stimuli.

This data show that there are only small differences in the Q1 values between the vowels of CṼC and CVN stimuli at any of the timing points (not even at 20% or 40%, where CVN vowels may be expected to show less nasality) while the Q1 values differ at all timing points between oral (CVC) vowels and both underlyingly nasal (CṼC) and contextualised nasal vowels (CVN). This demonstrates that there are no significant acoustic differences in the degree of nasalisation between CṼC and CVN stimuli, while both are acoustically distinct from CVC items. We further analysed the formant transitions of F1 and F2 as well as how frequently a target with a specific final consonant was selected for a given input fragment which determined that there were no additional consistent phonetic cues which could influence the main result. The results of the formant transitions show a considerable number of differences for F1 (as indicator of manner of articulation) between oral vowels and nasal vowel. For F2, there are only few differences, mostly for high vowels (which are few in the experiment). Thus, the coarticulation points to different manners of articulation, while differences in place of articulation are not evident in this formant analysis. The results of the analysis based on a targets’ final consonant indicate that participants choose alternative words mostly on the basis of manner of articulation rather than based on place information. This contradicts the information available from the formant transitions. Detailed results for these analyses can be found in the Supplementary material to this article.

2.1.5 Participants

Fifty-nine female native speakers of Bengali (aged 18–23; mean average age 19.67) participated in the priming experiments. All were undergraduate students at Gokhale Memorial Girls’ College (Kolkata) and had corrected-to-normal vision and no hearing impairments. The participants were compensated appropriately for their participation.

2.2 Procedure

In the priming experiment, participants were tested in groups of at most 16 in a quiet and darkened room. The auditory primes were played through individual on-ear headphones (SONY MDR110 LP) and visual targets were projected onto a screen in Bengali script with a data projector. Reactions and reaction times were captured using a custom-made multi-participant experimental setup (Reetz and Kleinmann 2003). In a standard forced-choice paradigm, participants were asked to indicate whether the CV fragment they heard belonged to the left or right target word on the screen. No information was given about the oral or nasal quality of the signals. Responses were made via individual custom-made two-button boxes with the left button corresponding to the left target and the right to the right target. All participants were right-handed and used their dominant hand to indicate if the prime matched the target displayed on the right. A ten-item practice task with items not used in the main experiment was conducted and re-run if necessary until it was clear participants were comfortable with the task.

Trials were separated into two blocks, which were presented to different participants. Every prime was presented once in the case of the doublets and three times in the triplet set in each of the two blocks, to ensure that each possible prime-target combination was only shown once per block (cf. Table 1). Each block consisted of 182 pseudo-randomised trials presented in a different order in each block (Table 1 shows all 13 combinations in each block for one set of the 14 items). The side of presentation (left vs. right) of the target words was randomised and counterbalanced across blocks (cf. Table 1). Participants heard one beep followed by a 200 ms silence before each auditory prime. Immediately at the end of the prime the visual targets (pairs of words) were displayed for 800 ms. After an additional 700 ms (i.e., 1500 ms after the end of the prime and the beginning of the visual display) the next trial started with its beep. After every 14 trials a sequence of three beeps was presented, which served as a short break. Participants were told they would hear the beginning of a word and then had to decide which of the words presented on the screen was the closest continuation for what they had heard. They were instructed to respond as quickly and accurately as possible.

2.3 Word comprehension tasks

Two word comprehension tasks were conducted after the priming experiment to test listeners’ perception of the stimuli used in the that experiment. In both cases, listeners were presented with the same CV/Ṽ fragments used in the priming experiment, presented in a completely randomised order. The auditory fragments were played through headphones at a fixed pace. Participants were asked to respond in writing (pen and paper task) with the syllable they heard (Test 1; n = 26) or the complete monosyllabic word the fragment was taken from (Test 2; n = 26).[10] The results of these tasks are presented in Section 3.1 below.

2.4 Analysis of priming experiment

The data from two participants had to be omitted due to a malfunctioning button box and one additional participant was removed from the analysis due to consistently fast and even negative reaction times suggesting they were reacting to the prime rather than the target. In addition, all reaction times ≤0 ms and outside ±2 standard deviations were excluded from the following analyses (4.24% of data for triplet conditions and 3.96% for doublets). Errors (response accuracy) were analysed first separately, comparing the proportion of correct responses to a chance proportion of 50% using Pearson’s Chi Squared tests. Errors were then analysed using logit generalised linear models, treating responses (error/correct) as a binomial distribution. Reaction times were analysed using linear mixed effects models.

3 Results

After the presentation of the results of the word recognition tasks, the results section discusses both RT and error data for the doublet and triplet conditions of the priming experiment separately. First the analyses of the triplet data will be presented, followed by the presentation of the doublet data, which investigates how listeners use nasality when there is a lexical gap in the language and one variant of the CVC sequence (either CṼC or CVN) is not available in the lexicon. Errors were then analysed using logit generalised linear models, treating responses (error/correct) as a binomial distribution, for the fixed effect of prime vowel (CV(N) versus CṼ, CV(N) versus CV, CṼ versus CV), with random intercepts specified for participants and targets. Reaction times were analysed using linear mixed effects models, again for the fixed effect of prime vowel, with random intercepts specified for participants and targets.

3.1 Word recognition task results

As can be seen in Figure 5, CV syllable and CVC words result in the most accurate responses (91.4 and 75.3%), while CV(N) syllables seem to be most difficult for listeners to identify accurately from CV(N) fragments (2.6%). Even CVN words gave only 34.6% correct responses, which may well be modulated by the words available in the lexicon and their frequencies. This indicates that, in the syllable task, the nasality which is present in the signal of a CV(N) fragment is not unambiguous and is interpreted by listeners as either a CV or a CṼ syllable (49.4 and 48% responses).

Figure 5: 
Predictions and error results for triplet match (% correct) and neither (% chosen) conditions.
Figure 5:

Predictions and error results for triplet match (% correct) and neither (% chosen) conditions.

For CV and CṼ syllables and words, listeners are more likely to choose the correct word rather than any other and while there are a significant number of CṼ syllable and CṼC word responses (36.7 and 30.8%) to CṼ fragments, there are hardly any or only few CV(N) responses (0.3 and 16.2% respectively).

3.2 Priming triplet results

Since, as mentioned above, in one third of the trials there was no possible identity match of prime and target, the triplet data will be analysed in two separate conditions: match and neither. The match condition includes all instances where an exact match to the prime was available as a target (e.g. prime [tʃɑ̃] (from [tʃɑ̃d̪]); targets: [tʃɑ̃d̪] [tʃɑl]) and here only correct trials are included in our analysis. The neither condition contains the data from those trials where no identity match was available (e.g. prime [tʃɑ̃] (from [tʃɑ̃d̪]); targets: [tʃɑ̃n] [tʃɑl]) and therefore, since there is no correct item, all trials are analysed.

In the match condition, errors were analysed using a logit generalised linear model treating responses (error/correct) as a binomial distribution, for the fixed effect of prime vowel (CV(N), CṼ, CV), with random intercepts specified for participants and targets. Reaction times were analysed using a linear mixed model for the fixed effects of prime vowel and context of response (nested under prime vowel, i.e., the target that did not match the prime; Prime ∼ Context: CV ∼ CṼC, CV ∼ CVN, CṼ ∼ CVC, CṼ ∼ CVN, CV(N) ∼ CVC, CV(N) ∼ CṼC), with random intercepts specified for participants and targets.

In the neither condition, response preferences were analysed using a logit generalised linear model treating responses (possible response A/possible response B) as a binomial distribution, for the fixed effect of prime vowel, with random intercepts specified for participants and targets. Reaction times were analysed using a linear mixed model for the fixed effects of prime vowel and context of response, with random intercepts specified for participants and targets.

3.2.1 Reaction time results

In the match condition, the linear mixed model shows an overall significant effect for prime vowel (F(2, 260.5) = 17.93, p < 0.001) as well as for the context of response (F(2, 171.2) = 5.09, p = 0.002). In post-hoc Tukey’s HSD tests between the three different prime vowels, we see significant differences between the response latencies to targets after CṼ versus CV(N) primes (p = 0.001) and CṼ versus CV primes (p < 0.001) with responses after CṼ primes being faster than those after CV(N) and CV primes. The difference between latencies after CV(N) and CV primes is not significant (p = 0.065). When the context of the response is taken into account (i.e. whether the other target option was a CVN, CVC or CṼC word), the only significant difference in the planned comparisons is found after CV(N) primes, where participants are significantly faster to make an identity match if the other target is a CṼC word (t(171.2) = −2.88, p = 0.005) while there is no such difference for CVC (t(171.2) = −1.42, p = 0.157) or CṼC (t(171.3) = −0.41, p = 0.680) identity matches (see Table 3 for details).

Table 3:

Triplet reaction time results (combined RT for correct responses and identity responses by context (i.e. the other available prime) in the match condition and combined responses as well as RT per chosen target in the neither condition).

match
Prime CṼC CVN CVC
Context of response CVN CVC CṼC CVC CṼC CVN
RT (ms)

(StdError)
555

(13.14)
556

(12.93)
569

(13.73)
597

(13.05)
608

(13.40)
586

(13.10)
RT (ms) overall

(StdError)
555

(11.26)
583

(11.47)
597

(11.38)
neither
Prime CṼC CVN CVC
chosen CVC CVN CṼC CVC CṼC CVN
RT (ms)

(StdError)
614

(14.27)
636

(13.35)
567

(13.32)
622

(14.45)
587

(13.95)
611

(13.48)
RT (ms) overall

(StdError)
625

(12.96)
594

(12.98)
599

(12.91)

In the neither condition, the linear mixed model showed an overall significant effect for prime vowel (F(2, 83.07) = 4.63, p < 0.012) and the effect of context of response was significant (F(3, 1968) = 14.33, p < 0.001). A planned comparison between the two different possible responses for each prime vowel resulted in the following pattern: If the prime vowel was CV(N), participants reacted significantly faster when they chose a CṼC target compared to a CVC target (t(1968) = −5.58, p < 0.001). When the prime was a CV, participants chose a CṼC target significantly faster than a CVN target (t(1968) = 2.62, p = 0.009), and similarly when the prime contained a CṼ, the reactions to CVC were significantly faster than to the CVN targets (t(1968) = 2.31, p = 0.021). Both these cases are no-mismatch scenarios and, with no underlying phonological cues to guide responses, the results here may well be affected by frequency, as the CVN condition in the triplet set is overall less frequent than the CṼC and CVC conditions. While all conditions show significant differences in listeners’ preferences for a certain response, they show a significantly stronger preference (for the CṼC target) when the prime is a CV(N) fragment (see Figure 5 for an overview). This can be seen by comparing the degrees of reaction time difference across conditions (i.e. difference of differences), we find that responses to CV(N) primes are significantly more different than those to CV primes (t(2010) = −5.85, p < 0.001) and CṼ primes (t(2010) = −5.61, p < 0.001), while there is no difference between the CV and CṼ primes (t(2010) = −0.17, p = 0.866).

3.2.2 Error analysis

Overall, in the match condition, participants responded with the identity target 61.4% of the time and performed significantly better than chance in this task (χ 2(1) = 208.21, p < 0.001). In analyses by prime vowel, participants’ responses were significantly more accurate than chance in all three conditions: CṼC (χ 2(1) = 189.89, p < 0.001), CVN (χ 2(1) = 26.52, p < 0.001) and CVC (χ 2(1) = 36.70, p < 0.001).

In a logit generalised linear model (fixed effect: prime vowel), the data shows a significant effect for prime vowel (χ 2(2) = 48.06, p < 0.001). In a planned comparison of the percentages of correct responses to targets we find a significant difference between responses following CṼ versus CV(N) primes (χ 2(1) = 40.16, p < 0.001) as well as between those after CṼ versus CV primes (χ 2(1) = 31.90, p < 0.001) with a significantly larger percentage of correct responses to targets following CṼ primes. There is no difference between correct responses after CV(N) and CV primes (χ 2(1) = 0.45, p = 0.500).

Since there is no correct or incorrect response in the neither condition, it is not possible to provide an overall analysis of errors to see whether participants performed above chance. The results (illustrated in Figure 6; Std Errors can be found in Appendix B) were analysed by prime vowel and participants showed a significant preference for one target over the other in all three conditions. When presented with a CṼ (nasal vowel) prime, participants responded with a significantly greater number of CVN targets (63.8%) than CVC targets (36.2%; χ 2(1) = 51.16, p < 0.001). In the CV(N) (nasalised vowel) prime condition, CṼC targets (66.2%) were significantly more frequent than CVC targets (33.8%; χ 2(1) = 70.51, p < 0.001) and when presented with a CV (oral vowel) prime, participants responded with CVN targets 57.4% of the time over CṼC targets, which were chosen significantly less frequently (42.6%; χ 2(1) = 14.56, p < 0.001).

Figure 6: 
Predictions and reaction time results for the triplet condition
Figure 6:

Predictions and reaction time results for the triplet condition

In a logit generalised linear model, we again see a significant effect for prime vowel (χ 2(2) = 11.91, p < 0.003). The planned comparisons show that participants display a greater bias towards the specific target vowels above after both CV(N) and CṼ primes than after CV primes (CV(N) versus CV: χ 2(1) = 11.10, p < 0.001; CṼ versus CV: χ 2(1) = 5.89, p = 0.015). There is no difference in the degree of preference of the target vowels after CV(N) and CṼ primes (χ 2(1) = 0.81, p = 0.368).

3.3 Priming doublet results

For the doublet analysis, the analyses were split into two separate models – NoCVN, where no word with a nasalised vowel exists in Bengali, and NoCṼC, where no word with a nasal vowel exists in Bengali.

In the NoCVN condition, errors were analysed using a logit generalised linear model treating responses (error/correct) as a binomial distribution, for the fixed effect of prime vowel, with random intercepts specified for participants and targets. Reaction times were analysed using a linear mixed model for the fixed effect of prime vowel, with random intercepts specified for participants and targets.

3.3.1 Reaction time results

Reaction times of correct responses are analysed separately for the two conditions (NoCVN and NoCṼC).

The linear mixed model analysis shows a different effect for prime vowel (F(1, 1145) = 17.02, p < 0.001) in the two sets. In the NoCVN (no contextualised nasal) set, the analysis shows significantly faster reaction times following a CṼ prime compared to a CV prime (F(1, 750.4) = 24.36, p < 0.001). The data in the doublet set without nasal vowels (NoCṼC), however, shows no significant effect of prime vowel (F(1, 750.4) = 0.11, p = 0.736).

3.3.2 Error analysis

The overall errors in the two doublet sets were 34%. The data in the doublet condition shows clearly that while listeners detect the identity match well above chance in all conditions (χ 2(1) = 239.10, p < 0.001) and they performed equally well in both doublet sets (χ 2(1) = 0.01, p = 0.922) there are significant differences between the different prime vowels in both reaction times and errors (cf. Figure 7; Std Errors can be found in Appendix B).

Figure 7: 
Reaction time results (ms) and error rates for doublet conditions.
Figure 7:

Reaction time results (ms) and error rates for doublet conditions.

In the analysis of prime vowels in the set in which no word with a nasalised vowel exists in Bengali (NoCVN), the data shows a significant difference between the number of errors made following CṼ primes versus CV primes (χ 2(1) = 30.82, p < 0.001). Reactions to targets preceded by CṼ primes were substantially more accurate (26.3% errors) than those observed when primes contained an oral vowel (CV; 41.5% errors).

In the NoCṼC set, where a corresponding target with a nasal vowel did not exist in Bengali, participants made significantly fewer errors (χ 2(1) = 4.56, p = 0.033) when reacting to primes with nasalised vowels before a nasal consonant (CV(N); 31.1%) than when the targets followed oral vowel primes (CV; 37.1%). Furthermore, despite the significantly higher error rates for targets following CV primes, participants still performed significantly above chance (χ 2(1) = 38.16, p < 0.001).

4 Discussion

The purpose of the present study was twofold: to determine whether the evidence supports an account proposing [nasal] as a privative or an equipollent feature and to investigate whether listeners’ responses were guided by the information specified in the lexicon or whether they react to surface nasality. We have used a three-way matching process to account for the relationship between the cues from the signal and features in the lexicon. If all features are fully specified, only an exact match or a mismatch are possible. However, if a feature is not specified, for instance if we assume a lack of specification as non-nasal for oral and contextually nasalised vowels, the relationship between the surface cues and representation invokes no-mismatch rather than a genuine mismatch or, indeed, a match if they were specified as [nasal]. Overall, the data provides strong support for a privative or monovalent representation of nasality and thus for the lack of specification of oral vowels as well as contextually nasalised vowels (by regressive assimilation from a following nasal consonant) as nonnasal. While there have been diverging proposals in the literature and different levels of specification have been put forward for different languages (see discussion in 1.1), the data collected here is best explained by the predictions made by an underspecification account such as the FUL model (Lahiri and Reetz 2002, 2010; cf. Section 1.4).

As illustrated in Figure 2, models such as this would predict that only [nasal] is represented and extracted, and thus the only real match is either the identity match between a CṼC prime and a CṼC target or that found between a CVN prime and a CṼC target, since the surface nasality in the signal of a CVN word can be mapped onto the specified feature [nasal]. FUL makes clear predictions for reaction times (Lahiri and Reetz 2002) for match and no-mismatch conditions and we would thus expect listeners to respond significantly faster to match conditions than to those which result in a no-mismatch. These predictions are borne out in the reaction time data for both the triplet and doublet experiments.

In the triplet match analyses (which only included correct trials), the only condition which results in significantly faster reaction times between identity primes and targets is that of CṼC primes, which is also the only condition which constitutes a real match between acoustic signal and underlying representation. There is no significant difference between the RTs after CVC and CVN primes, both no-mismatch conditions, and RTs are significantly slower in both than those to CṼC targets after a matching fragment prime. This data suggests that oral vowels in CVC words are not specified either as nonnasal or as oral, as otherwise there should have been no difference between response latencies after CṼC and CVC primes and their respective identity targets since both primes would result in a match with their underlying lexical representation of [+nasal] and [−nasal] respectively. The error data in the match condition shows an identical pattern, with participants providing correct responses significantly more frequently after CṼC primes than after CVN or CVC primes, while there is no difference in the error rates between responses following the latter two primes.

In the case of the CVN items, if one assumes an equipollent set of features e.g. [±nasal], two competing hypotheses are conceivable concerning their representation: (a) CVN is represented as [−nasal] (e.g. Chomsky and Halle 1968) or (b) CVN is represented as [+nasal] (e.g. Ohala and Ohala 1995). If (a) was the case, we should find a mismatch between a CVN prime and a CVN target as the nasality in the signal would mismatch with a non-nasal specification, which would result in significantly longer reaction times than for a CVC prime and CVC target since this would be a match. This is not borne out by the results (cf. Figure 6) as there is no significant difference between the CVC and CVN conditions in the triplet match analyses, which suggests that the relationship between acoustic input and representation is of a similar nature in both cases. In the case of hypothesis (b), which would correspond to our surface predictions (cf. Figure 1), all three prime-target combinations should result in identical reaction times since all constitute complete matches between the surface cues and the representations. However, this is clearly also not the case.

The neither triplet condition shows similar patterns in the RT results when listeners are confronted with a nasalised vowel from a CVN prime: CṼC targets are chosen significantly faster than CVC targets since the former are a match, as the nasality in the signal is matched with the stored feature [nasal] in the CṼC target while the relationship with the CVC target, which is underspecified, is a no-mismatch case (see Figure 2, middle). After CṼC and CVC primes, where both response options available result in a no-mismatch, the data shows no significant differences between RTs. Thus, the results do not suggest that the CVN words are represented as specified for [nasal] nor is there any support for the view that the CVC words are represented as [−nasal]. If the CVN words, for example, were represented as [nasal], we would expect listeners to respond faster to CVN words after a fragment prime from a CṼC word since the surface nasality in the prime should match the stored feature [nasal]. However, these RTs are significantly slower than those for CVC targets. In the condition with CVC fragment primes, there is no complete match available with either set of predictions, since the choices are CVN and CṼC targets. Again, it is not the case that oral vowels are being matched, because the CṼC targets are responded to faster. The degree of priming is significantly greater after CṼC primes than after either of the other primes, which further supports the facilitatory effect of the match with the specified feature in the underlying representation.

In the NoCVN doublet set, where listeners are presented with a choice between CṼC and CVC targets, they respond significantly faster to matching targets preceded by CṼC primes. When presented with [bɑ̃] from [bɑ̃d̪ʰ], listeners make an identity match significantly faster than when they are presented with [bɑ] from [bɑd̪]. Assuming the CṼC word has a specified nasal vowel, this constitutes a better match for its identity fragment, while the vowel in the CVC is not specified and is a no-mismatch for its CV fragment although their surface relationship is identical. If we assumed a binary feature, with an oral specification for the oral vowel, we should expect to observe no difference in the reaction times since both conditions would constitute a direct match both on the surface (phonetically) and in terms of the represented features (phonologically) and should therefore elicit similar reaction times. The NoCṼC set, where both vowels are unspecified, provides further support for an underspecification account (see Figure 2) since here the reaction times do not differ significantly but latencies after both CVC and CVN primes are considerably slower than those after the CṼC primes in the NoCVN set.

If frequency were the crucial determining factor, which is often assumed in exemplar-based models, and it were the case that Ṽ is more frequently heard in the context of CṼC than CVN, one could argue that frequency guides listeners’ choices, with shorter response latencies for CṼC targets after CṼ primes. However, if that were the case, as the oral vowel only ever occurs in the CVC context, CV primes and CVC targets should result in the fastest RTs, which is not what the data shows. This is not to say that frequency plays no role, since we also observe effects which could be attributed to frequency – when, for instance, the relationship between the prime and two targets is a no-mismatch in both cases and thus the phonological specifications do not guide responses (e.g. CV(N) prime and CVC or CVN target). However, overall the phonological representations play a critical role in listeners’ perception. A further consideration could be the number of neighbours: for instance, CṼC could be argued to have fewer neighbours and therefore result in faster response latencies. However, CVN words will have even fewer neighbours since they are constrained by the number of possible coda consonants (Bengali has three nasal consonants), while CṼ segments could theoretically be followed by up to 21 different oral consonants. Thus, while again this might contribute, it does not provide a comprehensive explanation of our data, as can be seen from the patterns of frequency data in Table 1, which do not correlate with the patterns observed in the data.

It seems that listeners are indeed using the lexical representation in the processing of nasal vowels. This can be seen particularly clearly in the reaction time data. However, the error data from the triplet neither condition as well as the doublet conditions show that listeners evidently also take salient surface cues into account. In the neither condition participants choose a CṼC target if there is any nasality present in the signal regardless of whether this stems from a CṼ or CV(N) prime. If no CṼC target is available, listeners choose a CVN word over a CVC word. The doublet data shows a similar tendency with identity matches being significantly more accurate after CṼ and CV(N) primes despite the fact that only the CṼC prime-target combination results in a match. Thus, even though a CṼ prime and a CVN target result in a no-mismatch, listeners choose the CVN target as often as they chose the CṼC target in the match condition (CV(N) prime ∼ CṼC target) since there is no CṼC target available in the NoCṼC doublet set.

5 Conclusions

The data collected in this study supports the status of [nasal] as a privative feature, whereby nasal vowels are specified as [nasal] while oral vowels and contextually nasalised vowels are not specified. We established that there was no significant difference in the acoustic degree of nasality (measured using the Q-factor of the bandwidth) between underlyingly nasal and contextually nasalised vowels (cf. Table 2). The data shows that, even in a language like Bengali where nasality in vowels is used contrastively, it is not necessary to encode predictable variation in nasality, and that listeners are sensitive to lexical representations and their processing of auditory speech signals is influenced by what is provided in the lexicon. If the listener hears any nasality on the vowel, either from an underlying nasal or via a following nasal consonant, and if there is a lexical choice of a specified nasal vowel, then the listener overwhelmingly chooses the underlying nasal. If, however, nasality is perceived and there is no nasal vowel in the lexical representation, then surface nasality wins. This can only be seen in the error data and not in the reaction times. In terms of the reaction time for the triplets, we see that if a nasal vowel prime is heard and the choice is between a CVN and a CVC target, correct reactions to CVN words are slower than those to CVC words. Had the CVN word been specified as [+nasal], these would have been chosen faster than the CVC words. Thus, lexical representations as reflected by reaction times suggest clearly that only privative [nasal] is specified and governs the choice of responses. We provide an overview of our crucial findings below:

  1. Lexical representations are utilised by listeners in processing and recognition, and a match facilitates processing.

  2. Reaction time data across experiments reflects the underlying representation and illustrates that real matches between the signal in the prime and the representation of the target lead to faster reaction times (cf. Figure 6).

  3. Error analyses show that any nasality in the signal predisposes the listener to choose a target with an underlying nasal vowel (CṼ and CV(N) to CṼC), unless such a target is unavailable, in which case surface matching of nasality is used (CṼ prime to CVN target).

  4. Both reaction time and error data provide clear support for an underspecification account of nasality with [nasal] as a privative feature.


Corresponding author: Sandra Kotzor, Language and Brain Laboratory, Faculty of Linguistics, Philology and Phonetics, University of Oxford, Oxford, UK, E-mail:

Appendix A: Full list of stimuli

  1. Triplets

CṼC CVN CVC
1 ʧɑ̃d চাঁদ moon ʧɑn চান bath ʧɑl চাল unboiled rice
2 kɑ̃ʧ কাঁচ glass kɑn কান ear kɑl কাল yesterday/tomorrow
3 pɑ̃ʧ পাঁচ five pɑn পান betel leaf pɑt̪ পাত place setting
4 bʰɑ̃r ভাঁড় clay cup bʰɑn ভান pretence bʰɑt̪ ভাত cooked rice
5 ɖɑ̃ʈ ডাঁট arrogance ɖɑn ডান right ɖɑl ডাল lentils
6 kʰũt̪ খুঁত flaw khun খুন murder kʰur খুর hoof
7 t̪ɑ̃t̪ তাঁত loom t̪ɑn তান tune t̪ɑʃ তাস cards
8 d̪ʰɑ̃ʧ ধাঁচ style, shape d̪ʰɑn ধান grain (rice) d̪ʰɑr ধার sharp
9 ʃõk smell ʃon listen ʃok grief
10 ʃɑ̃kʰ শাঁখ conch shell ʃɑn শাণ whetstone ʃɑt̪ সাত seven
11 gɔ̃d̪ গঁদ glue gɔm গম wheat gɔt̪ musical note
12 gʰũʃ ঘুঁষ bribe gʰum ঘুম sleep gʰur ঘুর whirl
13 dɑ̃t দাঁত tooth d̪ɑn দান donation d̪ɑʃ দাস servant
14 bɑ̃ʃ বাঁস bamboo bɑn বান flood bɑd̪ বাদ left out
  1. Doublets (no nasal vowel)

CVN CVC
1 lom body hair lobʰ greed
2 ʧen chain ʧek cheque
3 gɑn গান song gɑl গাল cheek
4 kʰɑm খাম envelope kʰɑp খাপ step
5 ʤɑn জান astrologer ʤɑt̪ জাত race
6 gʰɑm ঘাম drowsiness gʰɑʈ ঘাট bank
7 ʧun চূণ lime ʧul চুল hair
8 d̪in day d̪ik direction
9 ʈɑn টান pull ʈɑk টাক bald
10 t̪in three t̪il sesame seed
11 ʃɔŋ সঙ clown ʃɔkʰ সখ desire, wish
12 rɔŋ রঙ colour rɔʃ রস juice
13 d̪ʰum ধুম smoke d̪ʰup ধুপ incense
14 gun good quality gul charcoal ball
  1. Doublets (no nasal consonant)

CṼC CVC
1 ʈʰõʈ lips ʈʰok knock
2 ʤʰɑ̃ʈ ঝাঁট sweeping with a broom ʤhɑl ঝাল spicy hot
3 ʧʰũʧ ছুঁচ needle ʧʰuʈ ছুট run, escape
4 ʧʰɑ̃ʧ ছাঁচ mould, cast ʧʰat̪ ছাত terrace
5 ʧʰõɻ throw ʧʰoʈ run
6 ʤʰɑ̃p ঝাঁপ leap ʤʰɑɻ ঝাড় bush, shrub
7 ʤʰõk wish, inclination ʤʰol broth
8 pʰɑ̃ʃ ফাঁস knot, noose pʰɑg ফাগ red powder to play with
9 ʃɑ̃kh শাঁখ conch shell ʃɑk শাক greens, spinach
10 hɑ̃ʃ হাঁস duck hɑt̪ হাত hand
11 ʧõʧ fibre, splinter ʧor thief
12 gĩʈʰ গিঁঠ knot git̪ গীত song
13 pũʤ পুঁজ pus, infection pur পুর city
14 ʃũr সুঁর trunk of an animal ʃud̪ সুদ interest (payment)

Appendix B: Standard errors for reaction time and error analyses

Table 1:

Standard errors for error analyses of the triplet set.

  match
Prime CṼC CVN CVC
Probability of correct 68.8% 57% 58.3%
(StdError) (1.26) (1.35) (1.35)

  neither

Prime CṼC CVN CVC
chosen CVC CVN CṼC CVC CṼC CVN
Probability of chosen 36.2% 63.8% 66.2% 33.8% 42.6% 57.4%
(StdError) (1.86) (1.86) (1.82) (1.82) (1.91) (1.91)
Table 2:

Triplet reaction time results (combined RT for correct responses and identity responses by context (i.e. the other available prime) in the match condition and combined responses as well as RT per chosen target in the neither condition).

match
Prime CṼC CVN CVC
context of response CVN CVC CṼC CVC CṼC CVN
RT (ms)

(StdError)
555

(13.14)
556

(12.93)
569

(13.73)
597

(13.05)
608

(13.40)
586

(13.10)
RT (ms) overall

(StdError)
555

(11.26)
583

(11.47)
597

(11.38)

neither

Prime CṼC CVN CVC
chosen CVC CVN CṼC CVC CṼC CVN
RT (ms)

(StdError)
614

(14.27)
636

(13.35)
567

(13.32)
622

(14.45)
587

(13.95)
611

(13.48)
RT (ms) overall

(StdError)
625

(12.96)
594

(12.98)
599

(12.91)
Table 3:

Standard errors for reaction time and error analyses of the doublet sets.

  NoCVN Doublets
Prime CṼC CVC
RT (ms) overall 572.14 614.39
(StdError) (11.29) (11.64)
Probability of correct 73.7% 58.5%
(StdError) (3.70) (3.88)

  NoCṼC Doublets

Prime CVN CVC
RT (ms) overall 611.23 608.40
(StdError) (12.29) (12.42)
Probability of correct 68.9% 62.9%
(StdError) (3.47) (3.74)

References

Beddor, Patrice Speeter. 1993. The perception of nasal vowels. In Marie K. Huffman & Rena A. Krakow (eds.), Nasals, Nasalization, and the Velum. Phonetics & Phonology, vol. 5, 171–196. London: Academic Press.10.1016/B978-0-12-360380-7.50011-9Search in Google Scholar

Beddor, Patrice Speeter. 2007. Nasals and nasalization: The relation between segmental and coarticulatory timing. In Jürgen Trouvain & William J. Barry (eds.), Proceedings of the 16th International Congress of Phonetic Sciences. Saarbrücken, Germany.Search in Google Scholar

Beddor, Patrice Speeter & Winifred Strange. 1982. Cross-language study of perception of the oral-nasal distinction. The Journal of the Acoustical Society of America 71. 1551–1561. https://doi.org/10.1121/1.387809.Search in Google Scholar

Beddor, Patrice Speeter, Kevin B. McGowan, Julie E. Boland, Andries W. Coetzee & Anthony Brasher. 2013. The time course of perception of coarticulation. The Journal of the Acoustical Society of America 133. 2350–2366. https://doi.org/10.1121/1.4794366.Search in Google Scholar

Berger, Michael A. 2007. Measurement of vowel nasalization by multi-dimensional acoustic analysis. Rochester, New York, MA: University of Rochester.Search in Google Scholar

Bjuggren, Gunnar & Carl Gunnar Michael Fant. 1964. The nasal cavity structure. Speech Transmission Laboratory – Quarterly Progress and Status Report 5(4). 5–7.Search in Google Scholar

Boersma, Paul & David Weenink. 2011. PRAAT: Doing Phonetics by Computer (ver. 5.2.26). Amsterdam, The Netherlands: Institute for Phonetic Sciences.Search in Google Scholar

Butcher, Andrew. 1976. The influence of the native language on the perception of vowel quality. Arbeitsberichte des Instituts für Phonetik der Universität Kiel (AIPUK) 6. 1–137.Search in Google Scholar

Byrne, William, Michael Finke, Sanjeev Khudanpur, John McDonough, Hariet Nock, Michael Riley, Murat Saraçlar, Charles Wooters & George Zavaliagkos. 1998. Pronunciation modelling using a hand-labelled corpus for conversational speech recognition. In Proceedings of ICASSP, vol. 98, 313–316.10.1109/ICASSP.1998.674430Search in Google Scholar

Carignan, Christopher. 2018. Using ultrasound and nasalance to separate oral and nasal contributions to formant frequencies of nasalized vowels. The Journal of the Acoustical Society of America 143. 2588–2601. https://doi.org/10.1121/1.5034760.Search in Google Scholar

Chatterji, Suniti Kumar. 1926. Origin and Development of the Bengali Language. Calcutta, India: Calcutta University Press (reprinted by Rupa & Co., Calcutta, 1975).Search in Google Scholar

Chen, Marilyn Y. 1996. Acoustic Correlates of Nasality in Speech. Cambridge, MA: Massachusetts Institute of Technology PhD.Search in Google Scholar

Chen, Marilyn Y. 1997. Acoustic correlates of English and French nasalized vowels. The Journal of the Acoustical Society of America 102. 2360–2370. https://doi.org/10.1121/1.419620.Search in Google Scholar

Chomsky, Noam & Morris Halle. 1968. The Sound Pattern of English. New York, NY: Harper & Row.Search in Google Scholar

Clements, G. Nick. 2001. Representational economy in constraint-based phonology. In Tracy Alan Hall (ed.), Distinctive feature theory, 71–146. Berlin: Mouton de Gruyter.Search in Google Scholar

Cohn, Abigail C. 1993. Nasalisation in English: Phonology or phonetics? Phonology 1. 43–81. https://doi.org/10.1017/s0952675700001731.Search in Google Scholar

Cooper, Nicole, Anne Cutler & Roger Wales. 2002. Constraints of lexical stress on lexical access in English: Evidence from native and non-native listeners. Language and Speech 45(3). 207–228. https://doi.org/10.1177/00238309020450030101.Search in Google Scholar

Dang, Jianwu, Kiyoshi Honda & Hisayoshi Suzuki. 1994. Morphological and acoustical analysis of the nasal and the paranasal cavities. The Journal of the Acoustical Society of America 96. 2088–2100. https://doi.org/10.1121/1.410150.Search in Google Scholar

Dang, Jianwu & Kiyoshi Honda. 1996. Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation. The Journal of the Acoustical Society of America 100. 3374–3383. https://doi.org/10.1121/1.416978.Search in Google Scholar

Dash, Niladri Sekhar & Bidyut Baran Chaudhuri. 2001. A corpus-based study of the Bengali language. Indian Journal of Linguistics 20(1). 19–40.Search in Google Scholar

Dresher, B. Elan. 2009. The contrastive hierarchy in phonology. Cambridge: Cambridge University Press.10.1017/CBO9780511642005Search in Google Scholar

Elman, Jeffrey L. & James L. McClelland. 1986. Exploiting lawful variability in the speech wave. In Joseph S. Perkell & Dennis H. Klatt (eds.), Invariance and variability of speech processes, 360–380. Hillsdale, NJ: Erlbaum.Search in Google Scholar

Fant, Carl Gunnar Michael. 1960. Acoustic Theory of Speech Production. The Hague, Netherlands: Mouton & Co.Search in Google Scholar

Ferguson, Charles A. & Munier Chowdhury. 1960. The phonemes of Bengali. Language 36(1). 22–59. https://doi.org/10.2307/410622.Search in Google Scholar

Fowler, Carol A. 1986. An event approach to the study of speech perception from a direct realist perspective. Journal of Phonetics 14. 3–28. https://doi.org/10.1016/s0095-4470(19)30607-2.Search in Google Scholar

Ganong, William F. 1980. Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance 6(1). 110–125. https://doi.org/10.1037/0096-1523.6.1.110.Search in Google Scholar

Gaskell, M. Gareth & William D. Marslen-Wilson. 1997. Integrating form and meaning: A distributed model of speech perception. Language and Cognitive Processes 12. 613–656. https://doi.org/10.1080/016909697386646.Search in Google Scholar

Godfrey, John J., Edward C. Holliman & Jane McDaniel. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP, 517–520.10.1109/ICASSP.1992.225858Search in Google Scholar

Grosjean, François. 1980. Spoken word recognition processes and the gating paradigm. Perception and Psychophysics 28. 267–283. https://doi.org/10.3758/bf03204386.Search in Google Scholar

Hajek, John. 2013. Vowel Nasalization. In Matthew, S. & Martin Haspelmath (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. http://wals.info/chapter/10 (accessed 28 February 2022).Search in Google Scholar

Halle, Morris, Bert Vaux & Andrew Wolfe. 2000. On feature spreading and the representation of place of articulation. Linguistic Inquiry 31. 387–444. https://doi.org/10.1162/002438900554398.Search in Google Scholar

Hattori, Shiro, Kengo Yamamoto & Osamu Fujimura. 1958. Nasalization of vowels in relation to nasals. The Journal of the Acoustical Society of America 30. 267–274. https://doi.org/10.1121/1.1909563.Search in Google Scholar

Hawkins, Sarah & Kenneth N. Stevens. 1985. Acoustic and perceptual correlates of the non-nasal-nasal distinction for vowels. The Journal of the Acoustical Society of America 77. 1560–1575. https://doi.org/10.1121/1.391999.Search in Google Scholar

House, Arthur S. & Kenneth N. Stevens. 1956. Analog studies of the nasalization of vowels. Journal of Speech and Hearing Disorders 21. 218–232. https://doi.org/10.1044/jshd.2102.218.Search in Google Scholar

Huffman, Marie K. 1990. Implementation of Nasal: Timing and Articulatory Landmarks, vol. 75. UCLA PhD thesis.Search in Google Scholar

Hyman, Larry. 1973. The feature [Grave] in phonological theory. Journal of Phonetics 1. 329–337. https://doi.org/10.1016/s0095-4470(19)31401-9.Search in Google Scholar

Johnson, Keith. 1997. Speech perception without speaker normalization: An exemplar model. In Keith Johnson & John W. Mullennix (eds.), Talker variability in speech processing, 145–165. San Diego, CA: Academic.Search in Google Scholar

Jongman, Allard, Joan A. Sereno, Marianne Raaijmakers & Aditi Lahiri. 1992. The phonological representation of [voice] in speech perception. Language and Speech 35(1, 2). 137–152. https://doi.org/10.1177/002383099203500212.Search in Google Scholar

Klaiman, Miriam Holly & Aditi Lahiri. 2018. Bengali. In Bernard Comrie (ed.), The World’s Major Languages, 427–446. London: Routledge.10.4324/9781315644936-24Search in Google Scholar

Krämer, Martin. 2017. Is vowel nasalisation phonological in English? A systematic review. English Language and Linguistics 23(2). 405–437.10.1017/S1360674317000442Search in Google Scholar

Lahiri, Aditi. 2018. Predicting universal phonological features. In Larry Hyman & Frans Plank (eds.), Phonological Typology, 229–272. Berlin: Mouton de Gruyter.10.1515/9783110451931-007Search in Google Scholar

Lahiri, Aditi & William D. Marslen-Wilson. 1991. The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition 38. 245–294. https://doi.org/10.1016/0010-0277(91)90008-r.Search in Google Scholar

Lahiri, Aditi & William D. Marslen-Wilson. 1992. Lexical processing and phonological representation. In Robert D. Ladd & Gerard Docherty (eds.), Papers in Laboratory Phonology II: gesture, segment, prosody, 229–254. Cambridge: Cambridge University Press.10.1017/CBO9780511519918.010Search in Google Scholar

Lahiri, Aditi & Henning Reetz. 2002. Underspecified recognition. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory Phonology VII, 637–677. Berlin, Germany: Mouton de Gruyter.10.1515/9783110197105.637Search in Google Scholar

Lahiri, Aditi & Henning Reetz. 2010. Distinctive Features: Phonological underspecification in representation and processing. Journal of Phonetics 38. 44–59. https://doi.org/10.1016/j.wocn.2010.01.002.Search in Google Scholar

Liberman, Alvin M. & Ignatius G. Mattingly. 1985. The motor theory of speech perception revised. Cognition 21. 1–36. https://doi.org/10.1016/0010-0277(85)90021-6.Search in Google Scholar

Lindqvist-Gauffin, Jan & Johan Sundberg. 1976. Acoustic properties of the nasal tract. Phonetica 33. 161–168. https://doi.org/10.1159/000259720.Search in Google Scholar

Maeda, Shinji. 1982. The role of the sinus cavities in the production of nasal vowels. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 911–914. https://10.1109/ICASSP.1982.10.1109/ICASSP.1982.1171561Search in Google Scholar

Maeda, Shinji. 1983. Acoustic cues of vowel nasalization: A simulation study, VII, 25–36. Paris: Recherches/Acoustique Centre National d’Etudes des Telecommunications (CNET).Search in Google Scholar

Malécot, André. 1960. Vowel nasality as a distinctive feature in American English. Language 36. 222–229.10.2307/410987Search in Google Scholar

Marslen-Wilson, William D. 1987. Functional parallelism in spoken word recognition. Cognition 25. 71–102. https://doi.org/10.1016/0010-0277(87)90005-9.Search in Google Scholar

Marslen-Wilson, William D. 1989. Access and integration: Projecting sounds onto meaning. In William D. Marslen-Wilson (ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives, 148–172. Cambridge, MA: MIT Press.10.7551/mitpress/4213.003.0004Search in Google Scholar

Marslen-Wilson, William D. 1993. Issues of process and representation in lexical access. In Gerry T. M. Altmann & Richard Shillcock (eds.), Cognitive models of speech processing: The Second Sperlonga Meeting, 187–210. Hillsdale, NJ: Erlbaum.Search in Google Scholar

Marslen-Wilson, William D. & Alan Welsh. 1978. Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology 10. 29–63. https://doi.org/10.1016/0010-0285(78)90018-x.Search in Google Scholar

McCarthy, John J. 1988. Feature geometry and dependency: A review. Phonetica 43. 84–108. https://doi.org/10.1159/000261820.Search in Google Scholar

McClelland, James L. & Jeffrey L. Elman. 1986. The TRACE model of speech perception. Cognitive Psychology 10. 1–86. https://doi.org/10.1016/0010-0285(86)90015-0.Search in Google Scholar

McQueen, James M. 2005. Speech perception. In Koen Lamberts & Robert Goldstone (eds.), The Handbook of Cognition, 255–275. London: Sage.10.4135/9781848608177.n11Search in Google Scholar

Norris, Dennis. 1994. Shortlist: A connectionist model of continuous speech recognition. Cognition 52. 189–234. https://doi.org/10.1016/0010-0277(94)90043-4.Search in Google Scholar

Norris, Dennis & James M. McQueen. 2008. Shortlist B: A Bayesian Model of Continuous Speech Recognition. Psychological Review 115(2). 357–395. https://doi.org/10.1037/0033-295x.115.2.357.Search in Google Scholar

Norris, Dennis, James M. McQueen & Anne Cutler. 2000. Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences 23. 299–325. https://doi.org/10.1017/s0140525x00003241.Search in Google Scholar

Ohala, John J. & Manjari Ohala. 1995. Speech perception and lexical representation: the role of vowel nasalization in Hindi and English. In Bruce Connell & Amalia Arvaniti (eds.), Phonology and Phonetic Evidence. Papers in Laboratory Phonology IV, 41–60. Cambridge: Cambridge University Press.10.1017/CBO9780511554315.004Search in Google Scholar

Pallier, Christophe, Anne Christoph & Jacques Mehler. 1997. Language-specific listening. Trends in Cognitive Sciences 1. 129–132. https://doi.org/10.1016/s1364-6613(97)01044-9.Search in Google Scholar

Pallier, Christophe, Angels Colomé & Nuria Sebastian-Gallés. 2001. The influence of native-language phonology on lexical access: Exemplar-based versus abstract lexical entries. Psychological Science 12. 445–449. https://doi.org/10.1111/1467-9280.00383.Search in Google Scholar

Pierrehumbert, Janet B. 2002. Word-specific phonetics. In Carlos Gussenhoven & Natasha Warner (eds.), Laboratory Phonology, vol. 7, 101–139. Berlin: De Gruyter.10.1515/9783110197105.101Search in Google Scholar

Pruthi, Tarun. 2007. Analysis, Vocal-Tract Modeling and Automatic Detection of Vowel Nasalization. Maryland: University of Maryland PhD.10.21437/Interspeech.2007-40Search in Google Scholar

Pruthi, Tarun, Carol Y. Espy-Wilson & Brad H. Story. 2007. Simulation and analysis of nasalized vowels based on magnetic resonance imiging data. The Journal of the Acoustical Society of America 121. 3858–3873. https://doi.org/10.1121/1.2722220.Search in Google Scholar

Reetz, Henning & Achim Kleinmann. 2003. Multi-subject hardware for experiment control and precise recaction time measurement. In Maria-Josep Solé, Daniel Recasens & Joachim Romero (eds.), Proceedings of the 15th International Congress of Phonetic Sciences, 1489–1492. Barcelona.Search in Google Scholar

Scharinger, Matthias & Aditi Lahiri. 2010. Height differences in English dialects: Consequences for processing and representation. Language and Speech 53(2). 245–272. https://doi.org/10.1177/0023830909357154.Search in Google Scholar

Solé, Maria-Josep. 1992. Phonetic and phonological processes: The case of nasalization. Language and Speech 35. 29–43.10.1177/002383099203500204Search in Google Scholar

Solé, Maria-Josep. 1995. Spatio-temporal patterns of velo-pharyngeal action in phonetic and phonological nasalization. Language and Speech 38. 1–23.10.1177/002383099503800101Search in Google Scholar

Stevens, Kenneth N. 1998. Acoustic Phonetics. Cambridge, MA: The MIT Press.Search in Google Scholar

Stevens, Kenneth N. 2002. Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of the Acoustical Society of America 111. 1872–1891. https://doi.org/10.1121/1.1458026.Search in Google Scholar

Styler, Will. 2017. On the acoustical features of vowel nasality in English and French. The Journal of the Acoustical Society of America 142. 2469–2482. https://doi.org/10.1121/1.5008854.Search in Google Scholar

‘t Hart, Johan & Antonie Cohen. 1964. Gating techniques as an aid in speech analysis. Language and Speech 7. 22–39. https://journals.sagepub.com/doi/10.1177/002383096400700104.10.1177/002383096400700104Search in Google Scholar

Trigo, R. Lorenza. 1993. The inherent structure of nasal segments. In Marie K. Huffman & Rena A. Krakow (eds.), Nasals, Nasalisation, and the Velum. Phonetics & Phonlology, vol. 5, 369–400. New York: Academic Press.10.1016/B978-0-12-360380-7.50017-XSearch in Google Scholar

Vampola, Tomáš, Jaromír Horáček, Vojtěch Radolf, Jan G. Švec & Anne-Marie Laukkanen. 2020. Influence of nasal cavities on voice quality: Computer simulations and experiments. The Journal of the Acoustical Society of America 148(5). 3218–3231. https://doi.org/10.1121/10.0002487.Search in Google Scholar

Wright, James T. 1986. The behaviour of nasalized vowels in perceptual vowel space. In John J. Ohala & Jeri J. Jaeger (eds.), Experimental phonology, 45–67. New York: Academic Press.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/phon-2022-2017).


Published Online: 2022-05-27
Published in Print: 2022-04-26

© 2022 Sandra Kotzor et al., published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 28.4.2024 from https://www.degruyter.com/document/doi/10.1515/phon-2022-2017/html
Scroll to top button