Introduction

Spoken word production involves the operation of a series of cognitive mechanisms. A general top-down architecture of production starts from message or concept encoding, to lemma selection, lexeme retrieval, phonemic segment retrieval, syllable construction, and, finally, to articulation (Ferreira, 2010). Phonological retrieval and encoding is an indispensable process in language production. The WEAVER++ model (Levelt, Roelofs, & Meyer, 1999) suggests that at the beginning of phonological encoding in production, metrical and segmental units (e.g., stress and phonemes) are accessed in a parallel fashion. Later on, the phonemic segments are linearized in a syllabified organization that guides articulation. However, is the process of phonological retrieval and encoding in spoken word production the same across different languages? If not, what factors are responsible for the differences? In particular, does the orthographic form that represents the language matter? The present study investigated whether the use of different orthographic forms for the same language has an impact on phonological retrieval and encoding in spoken word production. We use the term “preparation unit” to refer to the phonological unit that is retrieved from the lexicon at the beginning of phonological encoding.

The form-preparation paradigm

The form-preparation task, also known as the implicit priming paradigm, has been frequently used to investigate the nature of the preparation unit in spoken word production (e.g., Chen, Chen, & Dell, 2002; Cholin, Schiller, & Levelt, 2004; Kureta, et al., 2006; Meyer, 1990, 1991; O’Seaghdha, Chen, & Chen, 2010). The task involves an associative-learning session and a naming session. In the associative-learning session, participants memorize some prompt-response word pairs (e.g., night-day, wet-dew, and bread-dough). After participants have informed the experimenters that they have memorized all of the pairs, an associative naming session is immediately conducted in which prompt words appear unpredictably and the participants are required to say the response word as quickly and accurately as possible, while their response time is recorded (e.g., when the word night is presented, participants need to say the word day). The rationale of this paradigm is that, compared with the heterogeneous (or control) context in which response words do not share any elements (e.g., three response words are day, sea, pie), in a homogeneous context where the response words share the same initial element (e.g., the initial phoneme is always /d/ for day, dew and dough), the fore-knowledge of the initial element allows the participants to prepare their first phonological unit in production, thus facilitating their naming latency. The smallest ingredient that can lead to such a form-preparation effect is referred to as the preparation unit.

Meyer (1990, 1991) studied the preparation unit in Dutch using the form-preparation paradigm. She found that the preparation unit did not differ within a language when words with different lengths were produced. Native Dutch speakers benefited from the fore-knowledge of the onset of a set of words regardless of whether the words were short (e.g., monosyllabic words) or long (e.g., disyllabic words). Furthermore, the fore-knowledge of later shared components of a set of words did not elicit any significant effects, since manipulating the similarity of the rime, coda, or the second syllable of a word did not lead to a form-preparation effect. A possible explanation is that the assembly of the phonological units is sequential. Participants always need to prepare the utterance of the onset of response words first, and then proceed to the rime. Given that measuring the response time of spoken word production is about the time participants take to produce the first sound of a word, fore-knowledge of the later components does not lead to a faster response time. Roelofs (1999) further showed that the benefit from the fore-knowledge of shared onset in the form-preparation paradigm is driven by shared segmental information but not phonetic features. Neither the place nor the manner of articulation of the initial phoneme affected the form-preparation effect in Dutch (e.g., although /p/ and /b/ are both bilabial and stop phonemes, they did not yield a preparation effect when speakers produced names such as bajes, bami, paling); and only sharing the exact same initial phoneme provided a benefit. Finally, fore-knowledge of simply metrical properties such as the number of syllables, primary stress location, or tonal information did not benefit the preparation of spoken word production, although variability in these properties may reduce the benefits from the advance knowledge of the initial segment (Chen, et al., 2002; Roelofs & Meyer, 1998).

The Preparation Unit in Different Languages and the Influence of Orthography

Research using the form-preparation task suggests that the preparation unit differs across languages (Chen, et al., 2002; Kureta, et al., 2006; Meyer, 1990, 1991; O’Seaghdha, et al., 2010). Native speakers of languages which are typically written using an alphabet such as Dutch (Meyer, 1990, 1991) and English (Jacobs & Dell, 2014; O’Seaghdha et al., 2010) have been shown to benefit from fore-knowledge of the initial phoneme, suggesting that the primary preparation unit in spoken word production in alphabetic languages is the phonemic segment. In O’Seaghdha et al. (2010, Experiments 5 and 6), native English speakers were asked to memorize a set of word pairs or picture-word pairs. Regardless of the format of the prompt (i.e., visual words or pictures), participants showed significant facilitation when a set of response words shared the same onset phoneme. Most recently, Jacob and Dell (2014) showed that the benefit from the fore-knowledge of the initial phoneme of a word may also be applied to compound words when two free morphemes are combined to form a morphologically complex word. The onset phoneme of the second morpheme of the compound (e.g., /d/ in sawdust) did not facilitate the word production. Chen et al. (2002) and O’Seaghdha et al. (2010), on the other hand, showed that native Mandarin speakers benefited from the same initial syllable but not from the same initial phoneme, suggesting that in Mandarin the preparation unit is the syllable rather than the phoneme. O’Seaghdha et al. asked native Mandarin speakers to memorize a set of monosyllabic word pairs when the prompts and responses could form a compound word. For example, in their Experiment 2, (que4) was the prompt and (ma2) was the response, and means sparrow in Chinese. In their Experiment 3, the prompts and responses were either semantically related words (e.g., (peng4, touch) served as the prompt and (mo1, stroke) as the response) or unrelated symbol-word pairs (e.g., & served as the prompt and (mo1, stroke) as response). In all situations, participants failed to show faster response times when the response words shared the same onset phonemes. Participants were also asked to memorize disyllabic word pairs (Experiments 1 and 7), and still failed to show onset facilitation; they only showed significantly faster response times when the response words shared the same initial syllable segment. In Kureta et al. (2006), native Japanese speakers showed facilitation from response words with the same initial CV (consonant-vowel) mora but not the same phonemic segment, suggesting that CV mora may be the primary preparation unit in Japanese.

Interestingly, the preparation units in those three different languages (i.e., English, Mandarin, and Japanese) are also consistent with how the writing systems most commonly used in these languages are designed. For languages using alphabetic orthographies such as Dutch and English, the phonemic segment is adopted in planning spoken words in these languages. For Mandarin whose primary orthography is morphosyllabic in which each character represents a syllable and a morpheme, speakers plan spoken words in syllables. Japanese speakers tend to plan their speech mora by mora, which is consistent with the language’s moraic orthography—each Japanese Kana letter represents a mora. Here a natural question arises: in a task in which orthographic information is explicitly represented, is it possible that a certain orthographic form would promote the selection of a particular preparation unit? Orthographic knowledge is not required in production, but it is possible that orthographic information affects spoken word production in literate speakers. Damian and Bowers (2003) conducted a series of experiments to investigate this issue using the form-preparation paradigm. Using stimuli that were presented in either visual or spoken form, they found that native English speakers did not benefit from overlapping of the initial phoneme if the phoneme was spelled in different ways in a set of response words (e.g., kennel, coffee, and cushion). However, these results were not replicated in Roelofs (2006). In an associative learning-naming task, native Dutch speakers showed a significant facilitative effect when the response words shared the same initial phoneme but not initial letters (e.g., sandaal, circuit, and CD). The inconsistency in the findings of these two studies was interpreted as a consequence of the difference in orthographic depth between Dutch and English. Compared to Dutch, sound and spelling correspondence is more complicated in English, and so orthography and phonology may interact more in speech production among native English speakers.

Another factor that was thought to contribute to the orthographic effect in Damian and Bowers (2003) is that speakers may use orthographic codes to facilitate completion of a complicated spoken word production task, such as the word associative learning and production task in the form-preparation paradigm (Alario, Perre, Castel, & Ziegler, 2007). Alario et al. (2007) found that, in a simple picture-naming task where orthographic information was not presented, speakers did not show the orthographic effect that was found in Damian and Blowers (2003). A number of studies suggest that skilled readers retrieve and manipulate phonological units of visual words when they comprehend or memorize the printed materials (Crowder, 1982; Mann, 1986; Perfetti & McCutchen, 1982, Stanovich, 1982). In an associative naming task with the form-preparation paradigm, it is possible that exposure to the visual words in the associative-learning session allows participants to encode the word pairs based on the orthographic codes of the language, such that the orthographic information (i.e., spelling) of the visual stimuli has an effect on spoken word production in the subsequent associative naming session. However, this influence may be dependent upon other factors such as the orthographic depth of the target language. The present study investigated the influence of orthography from a different perspective, that is, we examined whether different orthographic forms (e.g., alphabetic vs. morphosyllabic) cue different preparation units in spoken word production.

The preparation unit in Chinese: Evidence from different paradigms Previous studies using the form-preparation paradigm suggested that the preparation unit in Mandarin Chinese is the syllable segment (or atonal syllable, i.e., the syllable without including tonal information) (Chen, et al., 2002; O’Seaghdha, et al., 2010). However, studies using other methodologies have not yielded the exact same conclusion. Using a simple picture-naming task without explicit orthographic information or a learning session, Qu, Damian, and Kazanina (2012) provided ERP evidence (i.e., event-related potentials) that the phonemic segment is the fundamental unit of phonological encoding in spoken word production among native Mandarin speakers. ERPs are indicators of the electrical activities in the brain that are in response to specific events or stimuli (Blackwood & Muir, 1990). Native Mandarin-speaking adults were instructed to name colored line drawings of objects using color adjective-noun phrases. The color and object name either shared the same initial phoneme (e.g., huang2-he2zi “yellow box,” the number denotes the tone) or were phonologically unrelated (e.g., lü4-he2zi “green box”). Compared with the phonologically unrelated condition, when the color and object name shared the initial phoneme, participants showed more positive ERPs in the posterior regions 200

300 ms and more negative ERPs in the anterior regions 300–400 ms after the picture appeared in the phonologically related condition. Previous research suggests that the phonological encoding stage is estimated to take place at a 275–400-ms time window after a picture had appeared (see Indefrey & Levelt, 2004). Qu et al. (2012) proposed that the posterior ERP amplitude in the 200–300-ms time window might be a result of facilitation due to phoneme repetition during phonological encoding, and that the anterior ERP effect in the 300–400-ms time window was a result of internal speech monitoring that aims to avoid speech errors. Participants did not show significantly faster naming latency in the behavioral data in the phonologically related condition. This was explained as resulting from cancellation of the facilitative effect due to phoneme repetition by the negative effect in the internal speech-monitoring stage. Repeated phonemes may have made it easier for participants to exchange the adjacent speech sounds so that the internal speech monitoring could be under higher load in the phonologically related condition. In alphabetic languages such as English, phoneme-based facilitation may be very pervasive and much stronger than the inhibitory effect due to internal speech monitoring, such that speakers may still show facilitation as an overall effect. In summary, Qu et al. (2012) suggested that the phonemic segment may play a fundamental role in phonological retrieval and encoding when preparing spoken word production in Mandarin, although this does not imply that the phoneme plays exactly the same role in Mandarin as in English.

Previous research on different languages has not reached a consensus about the exact preparation unit during speech planning in various languages and whether orthography might play a role. Different tasks and paradigms may encourage participants to involve orthographic information to different degrees. In the present study, we investigated whether orthography might serve as a cue in spoken-word production by native Mandarin speakers by manipulating the orthographic form in the form-preparation task. Specifically, we studied whether the phonological preparation unit in spoken word production is influenced by the explicit representation of phonological information in the orthography.

Characteristics of the Chinese language and orthography

Unlike languages such as English or Dutch which are represented by only one writing system, there are two different writing systems – characters and Pinyin – used for Mandarin Chinese. As a result, Mandarin Chinese provides an excellent test case for the effect of orthographic form within the same language. The Chinese character is morphosyllabic, meaning that each character corresponds to a morpheme and a syllable. The Chinese character is also logographic, in that the phonological information is not explicitly represented in the orthography. Pinyin uses the Roman alphabet to transcribe the pronunciation of Chinese characters. Pinyin is a transparent system in which phonological information including onsets, vowels, codas, and tones are explicitly represented. In addition, Pinyin has a strict one-to-one letter-sound correspondence.

Pinyin is taught to all children in Mainland China in the first ten weeks of Grade 1 (6–7 years old) before they learn to read Chinese characters, and children use this system to help them learn the pronunciation of characters (Hanley, 2005). For example, the character “” (early) is represented by Pinyin with the spelling “zǎo.” Its onset is “z,” the vowel is “ao,” and the tone is marked above the vowel “ǎo.” Children are instructed to articulate a syllable by pronouncing the onset and vowel or rime separately, and then combine them together (Wang & Gao, 2011). For example, children are taught to pronounce the syllable by repeatedly saying it as: m-ā-. After acquiring Pinyin knowledge, children then receive instruction in characters with Pinyin printed above them, such as . In Pinyin instruction, when learning rimes with a nasal coda such as “ang,” children are not told that these sounds can be further segmented into a vowel and a final consonant (e.g., “a” and “ng”). Therefore, Pinyin instruction encourages children to segment a CVC syllable into an onset and a rime instead of an onset, a vowel, and a coda. Native Mandarin-speaking children’s awareness of the onset phoneme was significantly better at Grade 1 than that at the third level (i.e., the highest level) of kindergarten, largely due to their exposure to Pinyin instruction (Shu, Peng, & McBride-Chang, 2008). As a result, the acquisition of Pinyin knowledge may encourage Mandarin speakers to develop the ability to attend to the phonological information that comprises the onset and rime units. Note that even though Pinyin may not be a fully-fledged writing system and is not being read as commonly as Chinese characters in skilled readers’ daily lives, most native Chinese-speakers do use Pinyin as an input system to type characters on a computer.

Mandarin Chinese has a simple syllable structure. There are only two legal codas, /n/ and /ŋ/, and consonant clusters are not allowed. Re-syllabification does not occur in Mandarin Chinese. For example, in English the sound /s/ changes from a coda to the onset of the next syllable when the word mess changes to its adjective form messy, but this phenomenon does not occur in Mandarin Chinese. According to the WEAVER++ model of spoken word production (Levelt et al., 1999), re-syllabification might be one reason that native English speakers retrieve and encode phonemes instead of syllables at the beginning of phonological encoding—only after all the phonemes are retrieved can the speakers determine the arrangement of syllables in a word. By the same token, the lack of re-syllabification in Mandarin Chinese may allow Mandarin speakers to retrieve the syllable as an integral unit at the beginning of phonological encoding. Another feature of the Chinese language is that consonant clusters (e.g., sk-, gl-) are illegal, which makes it difficult to differentiate the role of onset versus initial phoneme. However, considering that the focus of the present study is to investigate whether Pinyin and character orthographies provide cues for different preparation units, and the procedure of Pinyin instruction encourages onset-rime division instead of the separation among phonemes, we believe that the preparation unit should be the onset instead of the initial phoneme if it turns out that Pinyin promotes a smaller unit for phonological retrieval and encoding.

The present study

The present study investigated the way native Mandarin speakers retrieve and encode phonological units during spoken-word production, and, in particular, whether a certain orthographic form cues for a particular unit. Two form-preparation tasks were used when the items in each task were presented in Chinese characters (Experiment 1) and Pinyin (Experiment 2). In both experiments, in order to investigate whether the onset is selected as the preparation unit, an onset session was included in which the critical variable was Context, namely whether a set of response spoken words shared the same onset (e.g., shéng, shǔ, shào, meaning rope, rat, and whistle, respectively) or did not (i.e., control condition, e.g., péng, tǔ, shào, meaning awning, soil, and whistle). In order to investigate the role of rime, a rime session was included in which the critical conditions were the same rime (e.g., shéng, péng, téng, meaning rope, awning, and pain, respectively) as the control condition (e.g., shéng, tǔ, pào, meaning rope, soil, and bubble, respectively). We hypothesized that an onset facilitation effect should be shown only for Pinyin stimuli (Experiment 2) but not for character stimuli (Experiment 1). In terms of the role of rime, Meyer (1991) showed that fore-knowledge of later components of a word did not lead to a form-preparation effect in native Dutch speakers, presumably because of the sequential processing of phonological units. If a similar procedure in Dutch applies to Chinese, then the same rime condition should not lead to a form-preparation effect when the materials are presented in Pinyin. For characters, shared rime alone may also fail to elicit any effect, considering that the characters would cue participants to retrieve and encode syllable as an integral unit.

Experiment 1: Implicit priming using characters

We used an associative naming task with the form-preparation paradigm. Participants memorized nine pairs of prompt-response words presented in Chinese characters. The critical variable was Context, that is, whether the response words shared the same onset, rime, or neither (heterogeneous). The same items were used in the same onset and same rime conditions. There were three lists in the same-onset, same-rime, and the heterogeneous conditions, respectively. The heterogeneous condition served as a control condition (see Table 1 for the lists in the onset session and Table 2 for the lists in the rime session). We expected to replicate the results found in previous literature (O’Seaghdha, et al., 2010) where participants failed to show a facilitative effect in the same-onset condition. We also expected to show a null effect in the same-rime condition if the orthographic form of Chinese characters cues speakers to retrieve and encode the syllable as an integral unit.

Table 1 The six lists of all items used for the form-preparation task in the onset session in both Experiments
Table 2 The six lists of all items used for the form-preparation task in the rime session in both Experiments

Participants

Participants were 16 native Mandarin-speaking students with normal or corrected-to-normal vision and without speech impairment from a mid-Atlantic University in the USA. There were eight females and eight males, whose ages ranged from 21 to 28 years (M=23.8, SD=1.87). All participants were paid for their participation.

Design and materials

In the form-preparation task, participants were asked to memorize nine pairs of monosyllabic characters. For the onset session, the response words consisted of three sets of three monosyllables with the onsets, /p/, /ʂ /, or /t/. For example, in the /p/ set, the three monosyllables were /pəŋ2/, /pu3/, and /paʊ̯4/ (the number denotes tone) that only shared the same onset but different rime segment and tone. Each prompt-response pair was composed of semantically related monosyllabic characters (e.g., /maʊ̯1/- /ʂu3/, cat-mouse), and no characters were polyphonic, such that each character had only one pronunciation. Furthermore, the prompt and its response character did not share any pronunciation characteristics. By shuffling the combinations of the nine items, three sets of three monosyllabic characters with the rimes /əŋ2/, /u3/, and /aʊ̯4/ were formed for the rime session. For example, in the /əŋ2/ set, the three monosyllabic characters were /pəŋ2/, /ʂəŋ2/ and /təŋ2/. These items shared the same-rime segment and tone but differed in onset.

For the onset session, each participant was required to complete the prompt-response associative naming task for six lists. The primary manipulation, Context, was whether the response words were homogeneous (sharing the same onset) or heterogeneous (sharing nothing systematically in common; the control condition). Three lists were homogenous, and the other three were heterogeneous. Each homogeneous list had its own corresponding heterogeneous list, so the six lists made up three homogenous-heterogeneous pairs. See Table 1 for all stimuli for each list. Each list had three presentation Blocks, and in each block, each item was presented four times (Repetitions) randomly. Therefore, in each list, each prompt was presented 12 times in total. The Sequence of contexts (i.e., whether participants received the homogeneous list or heterogeneous list first from each pair of lists) was counterbalanced across participants. The order of the three list-pairs was counterbalanced among participants as well. Each participant received 216 trials (3 blocks ×2 contexts× 3 onsets ×3 items ×4 repetitions). Sequence was treated as a between-subjects factor. Another counterbalanced variable was the sequence of onset and rime sessions. Half of the participants received the onset session first and the other half received the rime session first.

For the rime session, the design was the same as that for the onset session, except that the items in a homogeneous list shared the same rime rather than the same onset. See Table 2 for all stimuli for each homogeneous list and heterogeneous list. All participants were asked to finish both the onset and rime sessions, and the order to finish these two sessions was counterbalanced among participants. The interval between the two sessions was about five minutes, and participants were informed at the end of the first session that another session was to follow and that all the requirements for the second session were same as the first session. The sequence of onset and rime session was coded as OR in the subsequent analyses and was treated as a between-subjects factor.

Procedure

In the associative-learning session, each participant had nine cards, each of which had one prompt-response pair printed on it. After participants indicated that they had memorized all of the pairs (memorization took five minutes on average), a practice test session was administered to help participants become familiar with the paradigm and to confirm that they had memorized all pairs. In the practice session, the nine prompt characters were presented on a computer screen in a random order, and the participants were asked to say the corresponding response character as quickly and accurately as possible. Only after participants’ performance indicated that they were familiar with the procedure and the materials (i.e., being able to provide the correct response for each prompt within 1,000 ms) did the formal testing session begin. Both the practice and formal tests were implemented using DMDX software (Forster & Forster, 2003). For each list, the cards of the three item pairs were shown to the participants, and none of the participants needed extra time to memorize the pairs before the testing. During the test session, each trial began with a 200-ms, 1,000-HZ warning tone and a cross (“+”) fixation presented at the center of the screen. 600 ms after the offset of the tone, a prompt in size 48 font appeared at the center of the screen for 150 ms. Participants were instructed to say the corresponding response character aloud, as quickly and accurately as possible. An AUDIO TECHNICA ATR-20 microphone was used as the voice key of the DMDX program to record participants’ response times. The next trial began 200 ms after a response was given, or after 1,500 ms if no response was given. During the experiment, the first author sat behind the participants and scored their naming accuracy, and a voice recorder was used to record participants’ responses so that the experimenter was able to re-listen to the responses later if needed.

Results

Response-time (RT) analyses were based on correct trials only (approximately 1 % of the trials were incorrect and were deleted). Approximately 3 % of the RT data was removed because the response failed to trigger the voice key or because hesitations or disfluencies occurred. Analyses were carried out in R, an open-source programming environment for statistical computing (R Development Core Team, 2008) with the lme4 package (Bates, Maechler, Bolker, & Walker, 2013) and lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2013) for linear mixed effects modeling (LLM, GLMM). For the onset session, Context, Block, OR (i.e., the sequence of onset and rime session) and Sequence were entered in the model as fixed effects whereas Participant and Item were entered as random slopes where Context served as the intercept. The mixed effects model entered in R for the analysis of RT was “RT ~ Block* Sequence * OR * Context+ (1 + Context|Participant) + (1 + Context|Item).” Onset was not included as a fixed effect because it had already been embedded in items (i.e., each item had its own onset) and Item has been included as a random slope. Therefore, we decided to collapse onsets and to investigate the overall Context effect.

Likewise, for the rime session, Context, Block, OR, and Sequence were entered in the model as fixed effects whereas Participant and Item were entered as random slopes with Context serving as the random intercept. An analysis of accuracy rate in both sessions followed a similar procedure with the generalized linear (binomial) mixed-effects regression, but failed to show any significant results. For the RT data, we reported analyses of the onset session and rime session separately in order to make the model concise. See the upper part (i.e., Orthographic Type is character) of Table 3 for summaries of the descriptive data of both the onset and rime sessions.

Table 3 Descriptive data of participants’ performance in both orthographic types (i.e., both experiments) with mean reaction time (M), error rates (E%), standard errors (SE), and preparation effects

For the onset session, the main effect of Context was not significant (F (1, 10.2) = .014, p = .908). Context did not show significant interaction with any other variables (F (2, 3262.5) =1.023, p = .360 for Context and Block; F (1, 11.9) =1.575, p = .233 for Context and Sequence; F (1, 11.9) =1.216, p = .292 for Context and OR, respectively). A significant Block main effect was shown (F (2, 3262.4) =10.332, p < .001), probably due to a practice effect given that the RT showed a trend of being faster in the later blocks (Block 1: 749 ms; Block 2: 731 ms; Block 3: 721 ms). There was also a significant Block × OR interaction (F (2, 3262.5) =5.6116, p = .004), likely due to the fact that participants showed a more salient Block effect when they received the onset session first (p < .001) compared to when they received the rime session first (p = .178). See Table 4 for the full results.

Table 4 Results of the ANOVA approach to linear mixed-effect model analysis of reaction times of the onset session in Experiment 1

For the rime session, the main effect of Context did not reach significance (F (1, 15.6) = .348, p = .564), and a main effect of Block was shown (F (2, 3283.6) =6.186, p = .002). The participants’ RT tended to be faster in the later blocks (Block 1: 751 ms; Block 2: 737 ms; Block 3: 728 ms). Context did not show significant interaction with any other variables (F (2, 3283.6) = .229, p = .795 for Context and Block; F (1, 12.0) =3.135, p = .102 for Context and Sequence; F (1, 12.0) =2.162, p = .167 for Context and OR, respectively). There were also some complicated three-way interactions. See Table 5 for the full results.

Table 5 Results of the ANOVA approach to linear mixed-effect model analysis of reaction times of the rime session in Experiment 1

Discussion

The critical results of Experiment 1 were: (1) participants failed to show an onset or rime effect; and (2) Context did not show significant interaction with any of the other variables. The absence of onset facilitation was consistent with previous literature (e.g., Chen, et al., 2002; O’Seaghdha, et al., 2010) that native Mandarin speakers failed to benefit from fore-knowledge of the onset during the associative naming of materials written in Chinese characters. The results suggested that presenting materials in characters did not cue onset to serve as the preparation unit among native Mandarin-speaking adults. Participants also failed to show any significant rime effects, suggesting that rime may not be the preparation unit either. In addition, Context did not show significant interaction with Block, Sequence, or OR, suggesting that even practice or increased familiarity with the paradigm failed to cue participants for the onset or rime unit.

Experiment 2: Implicit priming using Pinyin symbols

Experiment 2 aimed to investigate the preparation unit speakers select when they were asked to memorize and name words presented in Pinyin. If the absence of onset facilitation in Experiment 1 is due to the influence of the morphosyllabic Chinese characters, participants should show an onset-facilitative effect in Experiment 2 considering that onset is represented explicitly in Pinyin and children are taught to read Pinyin in the sequence of onset-rime-whole syllable assembly. In terms of the rime session, if a similar language production procedure in Dutch applies to Chinese, then the same rime condition should not lead to a form-preparation effect when the materials are presented in Pinyin (see Meyer, 1991). If Pinyin does serve as a cue for the rime unit as the preparation unit, a rime facilitative effect should be expected.

Participants

Participants consisted of 16 native Mandarin-speaking students with normal or corrected-to-normal vision and without speech impairment from the same subject pool in Experiment 1. None of these participants participated in Experiment 1. There were nine females and seven males, whose ages ranged from 21 to 29 years (M=24.3, SD=1.91). All of the participants were paid for participation.

Methods

The items, design, and procedure were the same as those in Experiment 1. The only difference was that the materials were written in Pinyin in Experiment 2. In other words, both the associative pairs in the learning session and the prompts for the retrieval during the naming session were given in Pinyin.

Results

The data-cleaning procedure and data analysis were the same as in Experiment 1. In total, 4.67 % of the data were removed, among which about 1.8 % of the data were incorrect trials, and the remaining data loss was because the response failed to trigger the voice key or because hesitations or disfluencies occurred. For both the onset and rime sessions, the analysis of accuracy rate again did not show any significant results. For the RT data, we reported analyses of the onset session and rime session separately again to avoid a lengthy model that would include non-critical interactions. See the bottom part (i.e., Orthographic Type is Pinyin) of Table 3 for summaries of the descriptive data of both the onset and rime sessions.

For the onset session, there was a significant Context main effect (i.e., onset facilitation) (F (1, 10.0) =5.673, p = .039). A Block main effect was shown (F (2, 3240.0) =10.921, p < .001), and, similar to Experiment 1, this block effect may be due to a practice effect since faster RTs were shown in the later blocks (Block 1: 690 ms; Block 2:669 ms; Block 3: 670 ms). Context did not show significant interaction with any other factors (F (2, 3239.9) = 1.066, p = .345 for Context and Block; F (1, 12.0) = .095, p = .763 for Context and Sequence; F (1, 12.0) = .120, p = .735 for Context and OR, respectively). See Table 6 for the full results.

Table 6 Results of the ANOVA approach to linear mixed-effect model analysis of reaction times of the onset session in Experiment 2

For the rime session, the main effect of Context reached significance (i.e., rime inhibition) (F (1, 9.2) =7.850, p = .020), and there was a significant interaction between Context and Sequence (F (1, 12.1) =7.626, p = .017). Participants showed a 13-ms interference effect when they received a heterogeneous list first, whereas they showed a 42-ms interference effect when a homogeneous list was given first. A main effect of Block was shown (F (2, 3222.4) =26.683, p < .001). The RT in Block 1 (723 ms) was higher than that in Block 2 (692 ms) and Block 3 (686 ms), suggesting a practice effect. However, Context did not show a significant interaction with Block (F (2, 3222.4) = .421, p = .657) or OR (F (1, 12.1) =1.275, p = .281). There were also some complicated three-way interactions. See Table 7 for the full results.

Table 7 Results of the ANOVA approach to linear mixed-effect model analysis of reaction times of the rime session in Experiment 2

To further examine the effect of orthographic form-cuing on the preparation unit, we combined the onset sessions in Experiments 1 and 2 to examine the interaction between Orthographic Type (i.e., character vs. Pinyin) and Context (same onset vs. control). In the linear mixed effects model, Context, Block, OR, Orthographic Type (labeled “Ortho” in R for brevity) and Sequence were entered into the model as fixed effects whereas Participant and Item were entered as random slopes with Context serving as the intercept.

The critical result was a significant Ortho × Context interaction (F (1, 23.9) =9.197, p = .006). When the orthographic type was Pinyin, the mean RT in the homogeneous condition was faster than that in the heterogeneous condition (27 ms). In contrast, when the orthographic type was Chinese character, the mean RTs in different contexts did not show a significant difference (only a 1 ms difference). Context did not show significant interaction with any other factors (ps > .10). The Block main effect was significant (F (2, 6518.3) =19.270, p < .001). Orthographic Type showed a significant main effect (F (1, 24.0) =6.116, p = .021). The mean RT in the Pinyin condition (677 ms) was faster than the mean RT in character condition (734 ms), most likely because Pinyin facilitated the encoding of phonological information in the task. See Appendix A for the full results.

Likewise, we also combined the rime sessions in the two experiments and included Orthographic Type as a new fixed effect in the linear mixed effects model. Orthographic Type and Context did not show a significant interaction in the new model (F (1, 24.0) =2.123, p = .158). However, separate analyses had shown that a significant rime interference effect was only present in the Pinyin condition but not the character condition. When the orthographic type was pinyin, a 28 ms rime interference effect was presented; when the orthographic type was a character, participants’ RT was also slower in the homogeneous condition than in the heterogeneous condition (9 ms), and this small trend of rime inhibition may have contributed to the non-significant Ortho × Context interaction. See Appendix B for the full results.

Discussion

For the onset session, the critical results in Experiment 2 were: (1) participants showed significant onset facilitation when the materials were written in Pinyin; and (2) there was no significant interaction between Context and any other variables. In addition, combining the onset session of Experiments 1 and 2, a significant Orthographic Type × Context interaction was shown (p = .006). This interaction and the significant onset facilitation in Experiment 2 were consistent with our prediction that participants tended to be more sensitive to the smaller unit (i.e., onset) when Chinese words were written in Pinyin, given the alphabetic nature of the Pinyin system and that presenting the materials in Pinyin may have cued onset-rime division.

For the rime session, inconsistent with our prediction, the current study showed a significant rime interference effect in Experiment 2. We speculate that this might be a result of lexical competition since the repetition of the target rime may activate the rime neighborhood of the target word. After the participants said the response word of a trial, the lexical information of its rime neighbors, including the response word of the next trial, might be activated, but the lexical representation of the current trial is in opposition to that of the next trial, thus inhibiting the pre-activation of the next trial and leading to an interference effect. This explanation was built upon Lukatela and Turvey (1996), in which a rime inhibitory effect was shown among native English speakers in a word-naming task with a priming paradigm. In Lukatela and Turvey (1996), when the stimulus onset asynchrony (SOA) was 36 ms or 70 ms, participants named a target word (e.g., nose) more slowly when its prime shared the target’s rime (e.g., hose). Their explanation was that the prime (e.g., hose) helped the lexical activation of both the prime and its rime neighbors, including the target (e.g., nose). The lexical activation of the prime was in competition with that of the target, and thus inhibited the pre-activation of the target. This inhibition effect occurred only in the same rime but not in same onset session, probably due to the fact that rime is the later component of a monosyllabic word. By the time speakers access the rime unit, they have also activated the lexical information; in contrast, by the time they have accessed the onset unit, only the facilitative effect occurred while the lexical information may not yet have been retrieved.

Another significant interaction that involved Context was the Sequence × Context interaction, which might be a result of the practice effect, considering that participants showed larger interference effects when they received a homogeneous list before its corresponding heterogeneous list. This was also in line with the main effect of Block, which could be due to a practice effect as well.

When the rime sessions of Experiments 1 and 2 were combined, the interaction between Orthographic Type and Context did not reach significance (p= .158). However, a post-hoc analysis and separate analysis in Experiments 1 and 2 showed that the interference effect was larger in Experiment 2 than in Experiment 1, and only Experiment 2 showed a significant rime effect. An explanation is that the rime unit is explicitly represented in Pinyin (Experiment 2), thus making the rime unit more salient to participants in Experiment 2. As a result, the lexical competition between the rime neighbors was stronger in Experiment 2.

General discussion

We used two form-preparation tasks to investigate the effect of orthographic form-cuing on the preparation unit in planning spoken-word production. The findings in the onset sessions of the two experiments suggest that when the target words in a production task are visually presented in advance (i.e., the orthographic information is explicitly provided in the associative-learning session), the preparation unit is cued by the orthographic form of the presented visual words. When the visual words are presented in the form of an alphabetic orthographic system (e.g., Pinyin), the phonological encoding is done in the smaller unit – onset; when the visual words are presented in the form of a morphosyllabic orthographic system (e.g., Chinese characters), phonological information may be encoded in a larger unit. Although it is difficult to tease apart the effect of phoneme and onset because there are no consonant clusters in Mandarin, the simple syllabic structure of Chinese and the experience of Pinyin in early primary grades suggest that in practice Pinyin encourages phonological coding in onsets and rimes but not in phonemes (see Ziegler & Goswami, 2005, for review). Thus, it is more likely that the onset instead of the phoneme is the functional preparation unit that is responsible for the facilitation in the homogenous context in Experiment 2.

Onset facilitation

To the best of our knowledge, the present study is among the first that investigated the influence of orthographic form (e.g., alphabetic vs. morphosyllabic) on spoken word production. Our findings suggest that there is some flexibility in native Mandarin speakers’ selection of the preparation unit. The written words may cue participants to encode the phonological information in a way that is consistent with its orthographic form since readers are flexible in selecting the preparation unit in different contexts when cued by different orthographic forms. Previous literature suggested that skilled readers retrieve and manipulate the phonological unit of visual words when comprehending and memorizing print materials (see Crowder, 1982; Mann, 1986; Perfetti & McCutchen, 1982; Stanovich, 1982, for reviews), and this explains why the adult participants in the current study tended to select the onset when memorizing and associatively naming materials written in Pinyin but not characters—it is more consistent with the alphabetic nature of the Pinyin system. The phonological organization of the Pinyin system, which is highly correlated to its orthographic form, encourages an onset-rime division. In contrast, the orthographic form of the Chinese character does not cue the onset-rime division. Chen and Chen (2013) argued that the syllable segment is the preparation unit in Chinese and that this is an intrinsic property of the production system. The researchers showed consistent results in simple picture-naming tasks with the form-preparation paradigm, in which no associative-learning session was involved. Taking all these findings together, we suggest that it is likely that native Mandarin speakers prefer to encode the phonological information in syllables in spoken-word production; however, we further suggest that skilled readers are flexible in encoding phonological information according to the cue of the context (e.g., the orthographic form of the visually presented stimuli), since they have acquired knowledge of multiple orthographic forms (i.e., Pinyin and characters). The explicit phonological information in Pinyin cues the readers to shift their preference from a large unit (syllable) to a smaller unit (onset) in planning spoken words.

For the general architecture of production, we propose that after lexical selection at the lemma level, orthographic form may be involved prior to phonological encoding. If the selected lexical items have been visually presented in advance, the orthographic form of the visual words cued speakers to encode a certain phonological unit. In the present study, Pinyin promotes the smaller phonological unit, onset, and Chinese characters may have promoted the larger unit, syllable. In addition, we propose that orthographic cues may be used to facilitate information retrieval as well. When the items in a production task are visually presented as input in advance, literate speakers rely on orthographic cues to decode the phonological information; in the production task carried out later, using the same orthographic cues may facilitate expedient retrieval of phonological information.

The present study suggested that the orthographic input cues phonological retrieval and encoding in a production system. Likewise, Chen and Li (2011) suggested that the output format in a production system (e.g., word typing vs. word naming) also cues word-form encoding among native Mandarin speakers. When the target words shared the onset consonants with the prime words, participants showed faster RTs compared to when targets and primes did not share onsets when typing the target words. However, this facilitation was not shown in a word-naming task. Both typing and naming tasks require accessing the phonological codes, but in word typing, participants need to use the Pinyin system to type character words and they need to anticipate the segment-based finger movements, compared to naming where participants are not required to access the segmental information. This contrast promotes the encoding of the smaller unit, the onset, as the preparation unit. The Pinyin input cue in the present study may have functioned similarly to the word-typing output cue in Chen & Li (2011), and both types of cues may have encouraged participants to attend to sub-syllabic units.

The rime interference effect

The present study showed a significant interference effect when the response words had the same rime when the orthographic type was Pinyin. As discussed earlier, this unpredicted interference effect may arise as a result of lexical competition. Unfortunately, the current experiment is not able to address the underlying mechanism of the rime interference effect directly and future research is needed to pin down the exact factors that contribute to this effect. Meyer (1991) showed that the fore-knowledge of the rime unit did not have an effect in Dutch. A possible explanation for the inconsistent findings regarding the shared rime is cross-linguistic differences. Compared to Chinese, which does not allow consonant clusters within a syllable, Dutch has a much richer set of onsets. In fact, words with consonant clusters as the onsets were included in Meyer (1991), such as snoek (pike) and vloek (curse). However, only single consonants are legal as onsets in Chinese. Therefore, a shared rime in Dutch may not be enough to activate rime neighbors to lead to an inhibition effect. In contrast, in Chinese with a much simpler syllable structure, a shared rime may be sufficient to lead to activation of rime neighbors, resulting in lexical competition. Future research is needed to directly address the question of the conditions under which the rime interference effect occurs.

Limitations and future directions

The present study has some limitations. First, a possible factor that may have contributed to onset facilitation is participants’ English exposure and proficiency. All participants were undergraduate or graduate students at a mid-Atlantic University. Compared with native Mandarin speakers in China, extensive exposure to English may have resulted in higher English language proficiency and may have facilitated the participants in the current study to attend more to a small phonological unit such as the onset. This proficiency effect would be consistent with previous research that suggests that bilingual speakers’ experience with their second language (L2) may influence the preparation unit in their native language (L1) (Verdonschot, Nakayama, Zhang, Tamaoka, & Schiller 2013). In a masked priming experiment, native Mandarin Chinese speakers with high English proficiency were asked to read a series of Chinese characters as quickly and accurately as possible. One target character was shown at one time, and the prime prior to the target may share the same onset, same initial syllable, or nothing systematically with the target. Only when the syllable structure of the prime and the target overlapped (i.e., CV-CV or CVC-CVC), did participants show significant onset facilitation. The present study did not show onset facilitation in the character condition, which is probably related to the fact that we included different syllable structures in the three items of a homogeneous list (i.e., C+ Simple V, C+ Simple V+C, and C+ Diphthong). Nevertheless, we showed a significant onset-facilitative effect in the Pinyin condition even when the syllable structure remained different across the items. This finding suggests that Pinyin did indeed cue the onset unit for encoding. In summary, the onset facilitation in Experiment 2 is more likely due to the phonological organization of Pinyin, or at least due to both the alphabetic nature of the Pinyin form and participants’ English proficiency. In future research, participants’ English proficiency level could be measured to be included as a covariate in the linear mixed effects model to investigate its potential influence.

Second, the absence of onset facilitation in Experiment 1 only suggested that onset is not the preparation unit when the orthographic type is the character; however, it does not address the exact preparation unit selected by speakers. A follow-up experiment with two-character words that share the same initial syllable segment in the homogeneous context should be able to help determine whether it is indeed the syllable segment that speakers select for planning spoken words when cued by a morphosyllabic orthographic form. Third, in the same rime session, words in the same rime context did not only share the same rime segment, but also the same tone. The interference effect in Experiment 2 may be a result of both rime segment and tone or only one of them. Chen et al. (2002) showed that tone alone did not play a significant role in preparing for spoken word production in Chinese; however, mismatched tone did decrease the size of the preparation effect. Thus, future research is needed to tease apart whether rime segment alone or rime segment plus tone could lead to the inhibition. Fourth, the mechanism underlying the rime inhibition effect merits further investigation. We speculated that the different syllable structures may play a role in different patterns of the rime effect between Dutch and Mandarin speakers. A possible direction in future research is to vary the syllable structure of the words in the same language and to compare the rime effect on items with complicated versus simple syllable structures. Finally, orthographic information was explicitly represented in both experiments, suggesting that the preparation unit in Mandarin can be cued by the orthographic form that is visually presented. However, with regard to the preparation unit, it remains unclear when orthographic cue is not visually presented (e.g., in a simple picture-naming task). Although Chen and Chen (2013) adopted a simple picture-naming task, their participants from Taiwan have different literacy experiences from our participants from Mainland China (e.g., Mandarin speakers in Taiwan do not receive exposure to Pinyin), and so future research is needed for a direct comparison.

Conclusion

The present study showed that literate speakers are flexible in selecting the preparation unit in spoken word production, and orthographic form may serve as a cue to promote a certain phonological unit for encoding. We suggest that a revised spoken word production model take orthographic cues into consideration.