There is recent evidence (Dufour & Grainger, 2022; Dufour et al., 2021) that nonwords like /baksɛt/—created by transposing two phonemes of the real word, /baskɛt/—are perceived as being more similar to the base word /baskɛt/ than nonwords like /bapfɛt/, created by substituting two phonemes of the same word. In these studies, transposed-phoneme nonwords (/baksɛt/) took longer to classify as nonwords compared with substituted-phoneme nonwords (/bapfɛt/) in an unprimed auditory lexical decision task. This so-called transposed-phoneme effect can also be observed with a priming manipulation (Dufour & Grainger, 2022), such that transposed-phoneme nonword primes (/baksɛt/) are more effective in facilitating the subsequent processing of the corresponding base word target (/baskɛt/) than are substituted-phoneme nonword primes (/bapfɛt/).

The transposed-phoneme effect has strong theoretical implications since it raises the question of how phoneme positions are encoded during spoken word recognition. Since the first sounds that make up a word are heard and begin to be processed before later sounds, the most influential models of spoken word recognition (Gaskell & Marslen-Wilson, 1997; Grossberg, 2003; Marslen-Wilson, 1990; Marslen-Wilson & Warren, 1994; Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986; Norris, 1994) logically assume that the information extracted from the speech signal is encoded according to their position in the speech input in order to be successfully mapped onto an ordered sequence of speech segments. Obviously, such a coding of speech segments as a function of their correct positions fails to account for transposed-phoneme effects. To the best of our knowledge, there is currently only one model of spoken word recognition, the TISK model (Hannagan et al., 2013; see You & Magnuson, 2018, for a more recent implementation), that can account for transposed-phoneme effects. TISK is an interactive-activation model similar to the TRACE model (McClelland & Elman, 1986), but it replaces the position-dependent units postulated by most of models of spoken word recognition, including TRACE, by both a set of position-independent phoneme unitsFootnote 1 and a set of open-diphone units that represent ordered sequences of contiguous and non-contiguous phonemes. Within such a framework, both the position-independent phoneme units and the open-diphone units contribute to the flexibility in which phoneme order is encoded, as attested by transposed-phoneme effects.

One particularity of the TISK model is that it assumes that consonants and vowels are processed in exactly the same way. However, the study of Gregg et al. (2019) seems to suggest that is not the case, and that vowels and consonants could contribute differently to transposed-phoneme effects. In an extension of Toscano et al.’s (2013) study, Gregg et al. examined the eye movements of participants who followed spoken instructions to manipulate objects pictured on a computer screen. They replicated the main result of Toscano et al. (2013) that target words like GUM trigger more fixations on the picture corresponding to the transposed-word MUG than on the picture corresponding to the unrelated word PIT. At the same time, they showed that transposed words without vowel position overlap (LEAF–FLEA) were not fixated more than unrelated words, thus suggesting that positional vowel match is necessary in order to observe transposed-phoneme effects. Such a finding is in line with the results of studies of visual word recognition. In particular, there is evidence (e.g., Lupker et al., 2008; Perea & Lupker, 2004) that transposed letter effects occur when two consonants are transposed (e.g., CANISO–CASINO) but not when two vowels are transposed (e.g., CISANO–CASINO). The greater transposed-letter effect found with consonants compared with vowels has been linked to differences in the frequency of occurrence of consonants and vowels, which in turn could affect speed of processing (more frequently occurring letters being processed faster) or lexical constraint (more frequently occurring letters being less constraining).Footnote 2

The observation that transposed-letter and transposed-phoneme effects are stronger when consonants rather than vowels are transposed adds to the numerous demonstrations that vowels and consonants are processed differently (e.g., Hochmann et al., 2011; Nespor et al., 2003, for language acquisition; Bonatti et al., 2005, for speech segmentation; Berent & Perfetti, 1995; New et al., 2008, for visual word recognition; Delle Luche et al., 2014 for auditory word recognition; Caramazza et al., 2000, for neuropsychology). Most relevant for the present work are studies investigating word recognition, and one key finding here is that both visual (New et al., 2008) and auditory (Delle Luche et al., 2014) priming effects are greater when primes and targets share their consonants (e.g., TOXU–TAXI) than when they share their vowels (e.g., PABI–TAXI). Several hypotheses have been proposed to account for the differences observed between the processing of vowels and consonants. Berent and collaborators (Berent et al., 2001; Berent & Marom, 2005; Marom & Berent, 2010) proposed that written words are represented in the lexicon in terms of a consonant–vowel (CV) skeletal structure, in which there are different slots for consonants and vowels. On the other hand, rather than drawing a structural distinction between consonants and vowels, Nespor et al. (2003) suggested that consonants carry more weight than vowels in the process of lexical identification (i.e., greater lexical constraint), while vowels carry more weight than consonants in the extraction of structural relations. A further potential explanation for consonant–vowel differences is that given their higher frequency of occurrence and greater acoustical salience (vowels have longer durations and higher intensities than consonants), vowels are processed faster than consonants, at least for auditory stimuli (but see Berent & Perfetti, 1995, for the opposite hypothesis for written materials). Indeed, this is the explanation proposed by Gregg et al. (2019) for the impact of vowel overlap on transposed-phoneme priming effects that they observed. We return to examine these different hypotheses in the General Discussion, in light of the present findings.

In the present study, we provided a more in-depth examination of the role of consonants and vowels in driving transposed-phoneme effects. To the best of our knowledge, Gregg et al. (2019) is the only study so far to have examined the differential role of vowels and consonants in driving transposed-phoneme effects. Here, we build on that study, while attempting to overcome some of its limitations. First, the respective role of consonants and vowels was examined here on exactly the same target words which was not the case in Gregg et al.’s (2019) study and in which the differential role of consonants and vowels could be due to uncontrolled characteristics of the two sets of target words. Moreover, in Gregg et al. (2019), only one vowel was moved (e.g., LEAF-FLEA) in the vowel transposition condition whereas two consonants were moved (e.g., GUM-MUG) in the consonant transposition condition. As result, in the vowel transposition condition the transposed words and the target words did not share their CV skeletal structure, which may be another cause for the lack of effect in this condition. In the present study, the vowel and consonant transpositions involved two phonemes, either consonants or vowels, and in both the vowel and the consonant transposition conditions the prime stimuli had the same CV skeletal structure as the target words. We used a short-term priming procedure with prime nonwords and target words having the same CVCVCV structure. We followed the standard procedure of measuring the impact of transposed-phoneme primes within each phonemic category (e.g., /buʒãle/ and /bãluʒe/ for the base word /bulãʒe/ BOULANGER “baker”) against substituted-phoneme control primes that were created by substituting the two phonemes that were transposed in the transposed-prime condition with different phonemes (e.g., /buvãʀe/–/bulãʒe/ for the consonant transposition and /bõloʒe/–/ bulãʒe/ for the vowel transposition). If, as suggested by the results of Gregg et al. (2019), there is an advantage for consonants in driving transposed-phoneme effects, a greater transposed-phoneme priming effect is expected with consonants compared with vowels.

Experiment 1

Method

Participants

Sixty native French speakers from Aix-Marseille University participated in the experiment. All participants reported having no hearing or speech disorders. This sample size was determined on the basis of standard priming experiments that traditionally involve between 12 and 20 participants per experimental list (e.g., Delle Luche et al., 2014; New et al., 2008; Lupker et al., 2008; Perea & Lupker, 2004).

Materials

Fifty-six target words, six to seven phonemes in length, with a (C)CVCVCV syllabic structure were selected. For each target word, four nonword primes were created. Two were transposed-phoneme nonword primes. One was used in the consonant condition and was created by transposing the two internal consonants of the target word (/buʒãle/ for /bulãʒe/: the example here being the French word BOULANGER, which means “baker”) and the other was used in the vowel condition and was created by transposing the two internal vowels of the target word (/bãluʒe/ for / bulãʒe/). The two remaining primes served as control primes and consisted in substituted-phoneme nonword primes. One was the control for the consonant condition and was created by replacing the two internal consonants of the target word with different consonants (/buvãʀe/ for /bulãʒe/) and the other was the control for the vowel condition and was created by replacing the two internal vowels with different vowels (/bõloʒe/ for /bulãʒe/). The mean frequency of the target words was 12 occurrences per million. The complete set of prime and target words are given in Appendix Table 5.

Four experimental lists were created using a Latin-square design so that each of the 56 target words were preceded by the four types of prime (transposed-consonant, transposed-vowel, substituted-consonant, substituted vowel) across different participants, and participants were presented with each target word only once. For the purpose of the lexical decision task, 56 target nonwords were added to each list. The nonwords were created by changing the last phoneme of words not used in the experiment (e.g., the nonword /kõfityl/ derived from the French word /kõfityl/–CONFITURE). This allowed us to have wordlike nonwords, and to encourage participants to listen to the stimuli up to the end prior to giving their response. So that the target nonwords followed the same criteria as the target words, 14 of them were paired with a transposed-consonant nonword prime (e.g., /kõtifyl/–/kõfityl/), 14 other with a transposed-vowel nonword prime (e.g., /niʀade/–/naʀid/), 14 other with a substituted-consonant nonword prime (e.g., /ʃãzap/–/ʃãtav/) and the remaining 14 nonwords were paired with substituted-vowel nonword prime (e.g., /ʒyliv/–/ʒaluv/). Finally, to avoid strategic anticipation from the primes, 228 fillers consisting in prime and target pairs without any relation were added to each list. Again, for the purpose of the lexical decision task, half of the filler targets were words and the other half were nonwords. All the filler targets were preceded by a nonword prime.

All of the stimuli were recorded by a female native speaker of French, in a sound attenuated room, and digitized at a sampling rate of 44 kHz with 16-bit analog to digital recording. The mean duration of the target words was 621 ms. The mean duration of the nonword primes used in the consonant condition was 615 ms for both the transposed and substituted primes. The mean duration of the nonword primes used in the vowel condition was 621 ms and 619 ms for the transposed and the substituted primes, respectively.

Procedure

Participants were tested in a sound-attenuated booth. Stimulus presentation and recording of the data were controlled by a PC running E-Prime software. Primes and targets were presented over headphones at a comfortable sound level, and an interval of 20 ms (ISI) separated the offset of the prime and the onset of the target. Participants were asked to make a lexical decision as quickly and accurately as possible on the target stimuli, with “word” responses being made using their dominant hand on an E-Prime response box that was placed in front of them. RTs were recorded from the onset of target stimuli. The prime-targets pairs were presented randomly and an inter-trial interval of 2,000 ms elapsed between the participant’s response and the presentation of the next pair. Participants were tested on only one experimental list and began the experiment with 10 practice trials.

Results and discussion

Two participants that had an error rate greater than 30 % were removed from the analyses. The mean RT and percentage of correct responses on target words in each priming condition are presented in Table 1.

Table 1 Mean reaction times (in ms) and percentages of correct responses for the substituted and transposed primes in the consonant-change and vowel-change conditions of Experiment 1

RTs on target words (available at https://osf.io/ku2my/; Open Science Framework; Foster & Deardorff, 2017) were analyzed using linear mixed-effects models, with participants and target words as crossed random factors, using R software (R Development Core Team, 2016) and the lme4 package (Baayen et al., 2008; Bates & Sarkar, 2007). The RT analysis was performed on correct responses, thus removing 78 (2.4%) data points out of 3,248. An inspection of the data indicated that no RT strongly deviated from the distribution, and thus following Baayen and Milin’s (2010) recommendations, the model was applied to the complete set of correct RTs. Also, following Baayen and Milin (2010), for the model to meet the assumptions of normally distributed residuals and homogeneity of variance, a log transformation was applied to the RTs prior to running the model. The model was run on 3,170 data points. We tested a model with the variables phoneme category (consonant, vowel), prime type (transposed, substituted), and their interaction entered as fixed effects. The model failed to converge when random participant and item slopes for the within-factors prime type and phoneme category were included. Therefore, the final model included only random intercepts for participants and items (i.e., the maximal model that converged: Barr et al., 2013). We applied orthogonal contrast coding for the independent variables—namely, 0.5 for one condition and −0.5 for the other condition, which allows an estimation of main effects.

The main effect of prime type was significant (b = −0.0129, SE = 0.0053, t = −2.42, p < .05), with RTs on target words being shorter when preceded by transposed primes in comparison to substituted primes. The main effect of phoneme category was also significant (b = 0.0430, SE = 0.0053, t = 8.05, p < .001), with RTs on target words being shorter in the vowel condition than in the consonant condition. Crucially, the interaction between prime type and phoneme category was significant (b = −0.0351, SE = 0.0107, t = −3.28, p < .01). This interaction was due to a significant priming effect emerging only in the consonant condition (b = −0.0306, SE = 0.0078, t = −3.94, p < .001) but not in the vowel condition (b = 0.0045, SE = 0.0072, t = 0.62, p > .20).

The percentage of correct responses was analyzed using a mixed-effects logit model (Jaeger, 2008) following the same procedure as for RTs. No significant effects were found.

The results of Experiment 1 are straightforward. There was a sizable priming effect (30 ms) when consonants were transposed and no priming effect when vowels were transposed. Although in each transposition condition the transposition involved word-internal phonemes, a potential confound, however, is that in the consonant transposition condition, the initial syllable of the target words remained intact (/buʒãle/–/bulãʒe/), whereas this was not the case in the vowel transposition condition (e.g., /bãluʒe/–/bulãʒe/). As a result, the differential priming effect between consonant and vowel transpositions could be merely due to whether or not the first syllable was shared across primes and targets. Experiment 2 was designed to address this confound.

Experiment 2

The same CV.CV.CV words as in Experiment 1 were used, but now the transposed phonemes involved the two first consonants (e.g., /lubãʒe/–/bulãʒe/) in the consonant transposition condition, and the two last vowels (e.g., /buleʒã/–/bulãʒe/) in the vowel transposition condition. If, the results observed in Experiment 1 were merely due to the first syllable being shared in the consonant transposition condition, then a priming effect should only be observed in the vowel transposition condition in Experiment 2. In contrast, if the results observed in Experiment 1 were due to a differential role for vowels and consonants in driving transposed-phoneme priming effects, then we should again observe a stronger priming effect when consonants are transposed than when vowels are transposed.

Method

Participants

A power analysis based on the size of the priming effect found in Experiment 1 for each phoneme category revealed that 189 participants would be necessary to replicate the Prime Type × Phoneme Category interaction with a power of 80%. A total of 200 participants (i.e., 50 per experimental list) were thus recruited online for the experiment. All participants indicated that French was their native language. Because online experimentation facilitates both the recruitment of participants and running the experiment, we decided to increase the number of participants to provide a stronger test of the differential role of consonants and vowels seen in Experiment 1.

Materials

Forty-eight target words from Experiment 1 were reused.Footnote 3 For each target word, four nonword primes were created. Two were transposed-phoneme nonword primes. One was used in the consonant condition and was created by transposing the two first consonants of the target word (/lubãʒe/ for /bulãʒe/, BOULANGER–“baker”) and the other was used in the vowel condition and was created by transposing the two last vowels of the target word (/buleʒã/ for /bulãʒe/). The two remaining primes served as control primes and consisted in substituted-phoneme nonword primes. One was the control for the consonant condition and was created by replacing the two first consonants of the target word with different consonants (/ʀudãʒe/ for /bulãʒe/) and the other was the control for the vowel condition and was created by replacing the two last vowels with different vowels (/bulaʒõ/ for /bulãʒe/). The complete set of prime and target words are given in Appendix Table 6.

As in Experiment 1, four experimental lists were created using a Latin-square design so that each of the 48 target words were preceded by the four types of prime (transposed-consonant, transposed-vowel, substituted-consonant, substituted vowel) across different participants, and participants were presented with each target word only once. For the purpose of the lexical decision task, 48 nonwords taken from Experiment 1 were added to the lists, and the same filler trials as in Experiment 1 were reused. All of the stimuli were recorded by a female native speaker of French, in a sound attenuated room, and digitized at a sampling rate of 44 kHz with 16-bit analog to digital recording. The mean duration of the target words was 635 ms. The mean duration of the nonword primes used in the consonant condition was 663 ms and 669 ms for the transposed and substituted primes, respectively. The mean duration of the nonword primes used in the vowel condition was 669 ms and 667 ms for the transposed and the substituted primes, respectively.

Procedure

Exactly the same procedure as in Experiment 1 was used except that the experiment was programmed using LabVanced software (Finger et al., 2017), and participants gave their responses with the left and right arrows of their personal computer keyboard. Participants were instructed to put on their headphones and adjust the volume to a comfortable sound level.

Results and discussion

Twenty-six participants with an error rate greater than 30% were removed from the analyses. One target word that gave rise to an error rate of more than 40% was also removed. The mean RT and percentage of correct responses on target words in each priming condition are presented in Table 2.

Table 2 Mean reaction times (in ms) and percentages of correct responses for the substituted and transposed primes for the consonant-change and vowel-change conditions of Experiment 2

As in Experiment 1, RTs on target words (available at https://osf.io/ku2my/; Open Science Framework; Foster & Deardorff, 2017) were analyzed using linear mixed-effects models with participants and target words as crossed random factors, using R software (R Development Core Team, 2016) and the lme4 package (Baayen et al., 2008; Bates and Sarkar, 2007). The RT analysis was performed on correct responses, thus removing 551 (6.74%) data points out of 8,178. Five extremely long RTs, greater than 10,000 ms, were considered as “absurd” data (see Baayen & Milin, 2010) and were excluded from the analyses. Following, Baayen and Milin (2010), no further trimming procedure was applied. For the model to meet the assumptions of normally distributed residuals and homogeneity of variance, a log transformation was applied to the RTs (Baayen & Milin, 2010) prior to running the model. The model was run on 7,622 data points. We tested a model with the variables phoneme category (consonant, vowel), prime type (transposed, substituted), and their interaction entered as fixed effects. The model failed to converge when random participant and item slopes for the within-factors prime type and phoneme category were included. Therefore, the final model included only random intercepts for participants and items (i.e., the maximal model that converged: Barr et al., 2013). We applied orthogonal contrast coding for the independent variables—namely, 0.5 for one condition and −0.5 for the other condition, which allows an estimation of main effects.

The main effect of prime type was significant (b = −0.0126, SE = 0.0040, t = −3.14, p < .01), with RTs on target words being shorter when preceded by transposed primes in comparison to substituted primes. The main effect of phoneme category was also significant (b = −0.0706, SE = 0.0041, t = −17.56, p < .001) with RTs on target words being shorter in the consonant condition than in the vowel condition. Crucially, the interaction between prime type and phoneme category was significant (b = −0.0168, SE = 0.0080, t = −2.09, p < .05). This interaction was due to a significant priming effect emerging only in the consonant condition (b = −0.0211, SE = 0.0059, t = −3.58, p < .001) but not in the vowel condition (b = −0.0041, SE = 0.0054, t = −0.77, p > .20).

The percentage of correct responses was analyzed using a mixed-effects logit model (Jaeger, 2008) following the same procedure as for RTs. Only the main effect of phoneme category was significant (b = 0.2831, SE = 0.0946, z = 2.99; p < .01), with more correct responses in the consonant condition than in the vowel condition.

In sum, we successfully replicated the results of Experiment 1. A significant priming effect (35 ms) was again observed when consonants were transposed, and no priming effect was observed when vowels were transposed. We are thus confident that the pattern of priming effects found in both Experiments 1 and 2 is due a differential role for consonants and vowels in driving transposed-phoneme priming effects. Note that RTs are around 200-ms longer in Experiment 2. This is likely due to differences between in-lab experimentation and experiments run online. One advantage of online experimentation is that it enables the testing of participants from various backgrounds (not just psychology students for example) as well as being able to rapidly obtain sample sizes much larger than those typical of laboratory experiments. Moreover, several studies have now provided direct replications of in-lab experiments using online testing (e.g., Angele et al., 2022; Mirault et al., 2018). Nevertheless, online experiments are certainly subject to more noise (including environmental distractions such as noise or interruptions) than in-lab experiments (when these are conducted in isolated experimenting booths), and this likely explains the slower RTs in Experiment 2. What is crucial, however, is that the same pattern of effects is observed independently of any change in average RT.

Combined analysis of Experiments 1 and 2

As there were opposite effects of phoneme category in the two experiments, with shorter RTs in the vowel condition in Experiment 1, but shorter RTs in the consonant condition in Experiment 2, a combined analysis of Experiments 1 and 2 was performed in order to test if phoneme category significantly interacted with experiment. This was deemed necessary prior to providing an account of what might be driving this interaction. In order to facilitate comprehension of the overall design resulting from the combination of Experiments 1 and 2, Table 3 provides examples of primes and targets tested in the different conditions in both experiments.

Table 3 Summary of the different conditions tested in Experiments 1 and 2, with examples of the prime and target stimuli that were tested

The RT data of the two experiments were analyzed with the variables phoneme category (consonant, vowel), prime type (transposed, substituted), experiment (first, second) and their interactions entered as fixed effects. The results are summarized in Fig. 1. The main effect of prime type was again significant (b = −0.0131, SE = 0.0039, t = −3.39, p < .001), with RTs on target words being shorter when preceded by transposed primes in comparison to substituted primes. The main effect of phoneme category was also significant (b = −0.0136, SE = 0.0039, t = −3.53, p < .001) with RTs on target words being shorter in the consonant condition than in the vowel condition. The main effect of experiment was also significant (b = −0.1959, SE = 0.0053, t = −36.71, p < .001) with RTs being shorter in Experiment 1 than in Experiment 2. Crucially, the interaction between prime type and phoneme category was again significant (b = −0.0262, SE = 0.0077, t = −3.39, p < .001). This interaction was due to a significant priming effect emerging only in the consonant condition (b = −0.0250, SE = 0.0053, t = −4.70, p < .001) but not in the vowel condition (b = −0.0009, SE = 0.0052, t = −0.18, p > .20). As expected, the interaction between phoneme category and experiment was significant (b = 0.1137, SE = 0.0077, t = 14.72, p < .001) and was due to an effect of phoneme category that goes in the opposite direction between the two experiments. That is, a vowel change (either substitution or transposition) led to faster RTs than a consonant change (either substitution or transposition) in Experiment 1, whereas a consonant change led to faster RTs than a vowel change in Experiment 2 (see Fig. 1). The interaction between prime type and experiment was not significant (b = 0.0000, SE = 0.0077, t = 0.007, p > .20), and neither was the three-way interaction between prime type, phoneme category, and experiment (b = −0.0188, SE = 0.0155, t = −1.21, p > .20).

Fig. 1
figure 1

Summary of the mean RTs per condition in Experiments 1 and 2. C and V refer to the consonant vs. vowel status of the phonemes that were changed across primes and targets. Error bars are 95% CIs

Post hoc analysis of effects of phonetic similarity

The interaction between phoneme category and experiment with vowel primes leading to shorter RTs than consonants primes in Experiment 1, but to longer RTs in Experiment 2, requires an explanation. One possibility is related to differences in the phonetic similarity of primes with the corresponding targets. To examine the potential influence of prime–target phonetic overlap on the results of Experiments 1 and 2, for each phoneme change in the substituted and transposed-phoneme conditions we calculated the phonetic similarity for the different phonemes (i.e., for the substituted phonemes or the transposed phonemes in prime stimuli and the corresponding phoneme in the target word). This was done using the traditional phonetic features of French: place, voice, manner, and nasality for consonants; aperture, anteriority/posteriority, roundedness, and nasality for vowels). For example, the /õ/ of the substituted nonword /bõloʒe/ shares two phonetic features out of four with the /u/ of the target word /bulãʒe/ and the /o/ of the substituted nonword /bõloʒe/ shares also two features out of four with the /ã/ of the target word /bulãʒe/.

This analysis revealed that in Experiment 1 the primes (either substituted or transposed) used in the vowel condition were more similar to the targets than the primes (either substituted or transposed) used in the consonant condition, F(1, 110) = 11.35; p < .01, and the exact opposite was true for Experiment 2 with primes in the consonant condition being more similar to targets than the primes in the vowel condition, F(1, 94) = 5.63; p < .05. This can therefore explain why overall, vowel primes (both the transposed and substituted primes) generated faster RTs than consonant primes in Experiment 1, but slower RTs in Experiment 2.Footnote 4 However, it is important to note that these effects of phonetic overlap cannot account for the effects of interest here (transposed-phoneme priming) since substituted and transposed primes were matched for their overall phonetic similarity with target words (see Table 4).

Table 4 Phonetic similarity across primes and targets (average number of shared phonetic features out of four) in the substituted and transposed prime conditions of Experiments 1 and 2

General discussion

The key result of the present experiments is that transposed-phoneme nonword primes are more effective in facilitating the subsequent processing of the corresponding base word target than substituted-phoneme nonword primes, but this priming effect is limited to transposed phonemes that are consonants. Our findings therefore show an advantage for consonants over vowels in driving transposed-phoneme effects, and more generally speaking, add important new information with respect to differences in the way that vowels and consonants are processed during spoken word recognition.

The further observation of transposed-phoneme effects, albeit limited to consonant transpositions, remains a problem for models of spoken word recognition that code for the precise order of segments (Gaskell & Marslen-Wilson, 1997; Marslen-Wilson, 1990; Marslen-Wilson & Warren, 1994; Marslen-Wilson & Welsh, 1978; McClelland & Elman, 1986; Norris, 1994). In these models, transposed-phoneme nonwords and substituted-phoneme nonwords would produce similar levels of activation in the lexical representation associated with the base word, and thus transposed-phoneme nonwords and substituted-phoneme nonwords should have exactly the same impact on the subsequent processing of the base word. One possible way to reconcile transposed-phoneme effects with these models is to incorporate the notion of noise in the order encoding process, hence mimicking certain models of orthographic processing (e.g., Gómez et al., 2008), and their account of transposed-letter effects. For example, in Gomez et al.’s (2008) model, the representation of one letter is not strictly tied to a single letter position, but each letter in a letter string creates a distribution of activation over positions so that the representation of one letter extends into nearby letter positions. Incorporating such a mechanism in models like TRACE (McClelland & Elman, 1986) would allow the phoneme /ʒ/ of the nonword /buʒãle/ to activate the position-specific representation of phoneme /ʒ/ in Position 3, but also to a lesser extent the phoneme /ʒ/ in Position 5, thus accounting for the transposed-phoneme effect.

As discussed in the Introduction, the TISK model of spoken word recognition (Hannagan et al., 2013) actually predicted the existence of transposed-phoneme effects about the same time that they were first observed (Toscano et al., 2013). However, the problem with this model, as well as with modified versions of models like TRACE incorporating positional noise, is that all these models posit that consonants and vowels are processed similarly. This is however not the case in the present study as well as in the Gregg et al. (2019) study, both of which indicate that transposed-phoneme effects occur only when consonants are transposed but not when vowels are transposed. A way to reconcile the TISK model with the present findings is to assume that vowels are identified and assigned to their correct position more rapidly than consonants. As mentioned in the Introduction, the greater acoustical salience for vowels than for consonants could explain why vowels are more rapidly identified than consonants. Furthermore, the greater frequency of vowels (at least in languages like French and English) could also contribute to their faster identification. This would allow vowels to be identified and assigned to their correct position more rapidly than consonants, which in turn would prevent activation of words that share all their phonemes with the target word but with the vowels in different positions (see Gregg et al., 2019).

However, phoneme frequency correlates with another factor thought to impact on differences in the processing of consonants and vowels, and that is lexical constraint (e.g., Nespor et al., 2003). Consonants are more informative with respect to lexical identity than are vowels and are therefore thought to provide a greater contribution to the process of word recognition in both the visual and auditory modalities (see Dandurand et al., 2011, for an analysis relative to priming effects in the visual modality). Differences in lexical constraint could be contributing to the differences in transposed-phoneme priming effects for consonants and vowels reported in the present study. Within the framework of the TISK model (Hannagan et al., 2013) position-independent consonants would constrain lexical identity more than position-independent vowels (e.g., the presence of /k/, /z/, and /n/ in a six-phoneme word—/kazino/—provides more information about its identity than knowing that there is an /a/, an /i/, and an /o/, independently of phoneme order), and this would generate stronger transposed-phoneme priming for consonants than vowels.

In the Introduction, we noted a third factor that could impact on the differential processing of consonants and vowels. Berent and colleagues suggested that vowels carry more weight than consonants in the extraction of structural relations (i.e., the CV skeletal structure of words: Berent et al., 2001; Berent & Marom, 2005; Marom & Berent, 2010). However, since transpositions disrupt the CV structure of a word to the same extent whether it be consonants or vowels that are transposed, we fail to see how this factor could be driving differences in transposed-phoneme priming effects for consonants and vowels as reported in the present work.

To sum up, the present study showed that transposed-phoneme priming effects are influenced by the vowel versus consonant status of the transposed phonemes. We found a significant priming effect when the transposed phonemes were consonants but not when they were vowels. This finding fits with earlier observations that transposed-letter effects occur when the transposed letters are consonants but not when they are vowels (e.g., Lupker et al., 2008; Perea & Lupker, 2004), and points to differences in the way that vowels and consonants contribute to both spoken and written word recognition. We conclude that differences in speed of processing and lexical constraint provide two possible sources of such observations.