Introduction

This study was motivated by a real-life situation brought forward by a company supervisor who wanted to offer professional development on English proficiency for one employee, Malcolm (pseudonym), and thus contacted the researcher for suggestions. Malcolm was a native speaker of Mandarin and a learner of English as a second language (ESL). Malcolm earned his doctoral degree in Chemistry from an American institution and had lived in the USA for 19 years by the time of the study. According to the supervisor, a native speaker of English, Malcolm’s English writing skills were great, but his English speech was described as almost unintelligible by most of his colleagues and clients in the company. He added that Malcolm’s English skills had negatively impacted his professional opportunities in the company. Considering the situation, the company decided to invest in one-on-one pronunciation-centered tutoring for him with the goal that his English oral production would become more intelligible during daily conversations, meetings, and presentations at work. During the first meeting with the researchers, Malcolm expressed frustration at not being understood well at work and was determined to improve. Malcolm requested that the tutor be a native speaker of English with a linguistic background in Mandarin. Subsequently, the researcher arranged for Malcolm to meet regularly with Vivian (pseudonym), a native speaker of English and a Chinese Studies major. The supervisor, Malcolm, Vivian, and the researcher all looked forward to a noticeable improvement in 4 months and wondered how and to what extent form-focused instruction (FFI) that centered on pronunciation could improve Malcolm’s intelligibility in the work context. Therefore, the researcher conducted the study to answer the following research questions:

  1. 1.

    What are Malcolm and Vivian’s opinions of Malcolm’s development in speech intelligibility as a result of the 4-month tutoring?

  2. 2.

    Does Malcolm improve his workplace spontaneous speech intelligibility as a result of form-focused instruction based on listeners’ ratings?

This study aimed to examine the pedagogical potential of pronunciation instruction (PI) to improve global intelligibility in an adult ESL learner. Based on the analyses of comments from Malcolm and Vivian and listeners’ impressionistic judgments, the study attempted to discover the reality of pronunciation learning and offer a perspective that helps employers, learners, and instructors make informed decisions in relation to workplace ESL lessons.

Literature Review

Workplace ESL Training

In 2019, there were 28.4 million foreign-born people in the US workforce, comprising 17.4% of the total, up from 7% of the workforce in 1980 (Bureau of Labor Statistics, 2022). The rapid increase in immigration has required states and localities to employ strategies to integrate immigrants into their communities. Improving workplace English proficiency is one important undertaking for immigrant-receiving countries, such as the USA, Canada, and the UK because official language skills have a significant influence on labor market success (Picot & Sweetman, 2012). The study of Derwing et al. (2021) showed that the Language Instruction for Newcomers to Canada improved ESL learners’ production of speech acts (e.g., request a previously promised raise) as a result of the 25-h pragmatic instruction over 5 weeks, and the learners’ integration of socially appropriate pragmatic language use enhanced comprehensibility on listeners. In addition to pragmatic instruction, improvement in ESL learners’ comprehensibility and intelligibility in the workplace was observed through a stand-alone pronunciation training program (Derwing et al., 2014). The study reported that the putatively fossilized ESL learners in a window factory benefited from on-the-job instruction in their perception of English and the comprehensibility and intelligibility of their ESL productions. These studies revealed positive learning outcomes, and the increased presence of immigrants in the US workforce indicates the need for workplace ESL training.

Nevertheless, workplace ESL programs are not widely available or invested in currently. One question that is often raised by employers pertaining to the companies’ investment in developing their employers’ ESL intelligibility concerns the age of employees. According to the critical-period hypothesis (Penfield & Roberts, 1959), adult learners have passed the critical-age point for effective second language (L2) improvement, and this results in skepticism about the likely learning outcomes in workplace ESL intelligibility. Employers’ speculations are associated with the idea that there may be a biologically determined critical period for L2 acquisition that makes it challenging, if not impossible, for older learners to attain native-like proficiency. According to Hakuta et al. (2003), typical L2 learning outcomes, regardless of the measure of proficiency in various studies, decline with increases in the age of initial exposure to the target language. However, this perspective needs to be reconsidered in terms of what workplace intelligibility entails in relation to native-like proficiency. Levis (2018) defines intelligibility as an actual outcome of the interlocutor’s understanding. Since many L2 speakers exhibit clearly non-native-like characteristics in their linguistic behavior (e.g., pronunciation and syntax) yet are still understood by their interlocutors in daily life and in the workplace, possessing native-like proficiency is not a prerequisite for L2 learners to be intelligible. Hence, while the critical-period hypothesis (Penfield & Roberts, 1959) and an age-related decline in proficiency have appealed to theorists and researchers, these patterns do not necessarily dictate the learning outcomes of pedagogical intervention on intelligibility. Moreover, study results on the role of age in L2 learning outcomes are inconclusive (Hakuta et al., 2003) due to methodological difficulties in distinguishing between the impact of age on L2 development from other factors, such as motivation, duration of the learning experience, linguistic background, learning environment, and determination. Therefore, more studies are needed to provide employers with information regarding how adult ESL learners can improve speech intelligibility and comprehensibility through interventions despite evidence that the age of learning L2 is associated with ultimate proficiency level (e.g., Hakuta et al., 2003).

Fossilization and Language Transfer

The concept of fossilization in interlanguage development was first introduced by Selinker (1972) to describe L2 learners’ retention of deviant rules and forms that differed from those of the target language norms despite favorable opportunities for improvement provided to supplant non-nativelike patterns in L2 usage. This permanent non-nativelike state was termed fossilization, and in this end-state, L2 learners ceased to develop their interlanguage skills despite their motivation or opportunity to learn or to acculturate into the target society (Selinker & Lamendella, 1978). While fossilization is widely accepted as an L2 phenomenon, it is difficult to verify whether an L2 learner’s incorrect use of a target form is permanent. Stabilization, on the other hand, is easier to observe and is often considered the precursor to fossilization. During periods of plateau-like stability, learners might produce an L2 form correctly in one context but not in another, and such fluctuations persistently appearing in the learner’s speech for an extended period of time is indicative of stabilization (Selinker & Douglas, 1989). Depending on the study design, operational definitions of fossilization or temporary stabilization can vary. Nevertheless, Selinker (1972) suggested that many if not most of the fossilized or stabilized linguistic items were due to first language (L1) transfer. In this view, L1 transfer operates differently depending on linguistic features, and L1 influence can persist in certain aspects of interlanguage development from the initial exposure to the target language into ultimate attainment (Montrul, 2014). According to Corder (1983), L1 transfer can display a greater effect on learners’ acquisition of the L2 phonology than syntax, and the L1 influence on L2 pronunciation can be prominent and stabilized. This can be attributed to the assumption that pronunciation is considered a more difficult aspect of language to acquire and often requires special training (Fraser, 2000). Notwithstanding, Pennington (1998) and Fraser (2000) argued that phonology can and should be taught to adult L2 learners. Research has shown that adult learners whose pronunciation had stabilized or fossilized were able to benefit from PI (e.g., Couper, 2003; Derwing et al., 2014, 2021). On the other hand, there are also studies that were unable to report evidence of success in improving L2 learners’ pronunciation through pedagogical intervention (e.g., Macdonald et al, 1994a, 1994b; Saalfeld, 2011). Nonetheless, fossilization should be viewed as a process that can be explained and intervened (Long, 2003) and thus should not result in PI being excluded from the curriculum.

Intelligibility: Pedagogical Importance and Challenges

Speech intelligibility is broadly defined as “the extent to which a speaker’s message is actually understood by a listener” (Munro & Derwing, 1995, p 76), and intelligibility can be significantly influenced by one’s pronunciation (Levis, 2018). Since pronunciation particularly has a dramatic impact on one’s speech intelligibility, Levis noted that pronouncing a new language in a way that is easily intelligible to a wide range of interlocutors should be a goal for L2 learning. Munro and Derwing (2006) also stressed that the pedagogical relevance and importance of intelligibility should be emphasized when English language instructors prepare students to communicate successfully. However, Foote et al. (2011) found that teachers invest as little as 6% of their weekly instructional hours in PI. A number of reasons can explain this apathy. Offering stand-alone pronunciation classes is not practical in most L2 programs due to enrollments and teachers’ lack of expertise in PI (Lord & Fionda, 2014); therefore, PI is often unmethodically included in speaking practice. In addition, pronunciation does not exist on its own but is relative to other aspects of speech (Levis, 2018), and finding ways to sustain learners’ focus on accurate pronunciation while engaged in active oral communication is challenging for both teachers and learners (Darcy, 2018). Moreover, because of English’s status as a lingua franca and an international language, demand for low accent strength has decreased and a non-nativelike accent is an expected, normal characteristic of L2 users (Pennycook, 2017). Furthermore, the decline of audiolingualism has led to the marginalization of research on and teaching of pronunciation, which is often associated with overdependence on decontextualized practice and mechanical drilling. Consequently, PI often receives little attention in L2 teaching and learning until L2 learners have acquired a higher command of the target language that enables them to recognize how PI can specifically benefit their language development or until a loss of intelligibility in L2 learners’ oral production is identified (Lord & Fionda, 2014), as is the case in this study. Despite the reasons aforementioned and the instructional reality, studies have shown that pronunciation is a dominating factor in either facilitating or impairing spoken communication (e.g., Munro & Derwing, 2006; Zielinski, 2008). Hence, pronunciation should not be an optional feature of L2 instruction. For ESL learners, a lack of shared pronunciation norms with native speakers of English can impair communication when native speakers are unable to decode non-native speakers’ deviated pronunciation (Levis, 2018). As such, Malcolm’s need to increase intelligibility in his ESL speech was a focus of this instruction.

Research has noted that while PI can yield positive results in learners’ spontaneous speech production (e.g., Gordon & Darcy, 2016), PI is most productive when the treatment directly targets specific pronunciation characteristics and the learning outcomes are subsequently assessed using controlled tasks (e.g., Ruellot, 2011; Saito, 2011). However, in reality, classroom instruction rarely affords a clinical instructional environment in explicitly concentrating on a small number of particular linguistic features, tuning out non-target forms, and then evaluating certain acoustic properties of student speech at a micro level. Moreover, the transferability of laboratory-induced instructional gains to more spontaneous, interactive contexts remains unclear (Darcy, 2018). Therefore, perceptible change in learners’ speech intelligibility at a macro level is often the goal of PI in the L2 classroom (Thomson & Derwing, 2015), and this teaching objective concerns communicative intelligibility. To reflect this aim, more research is needed on both the efficacy of classroom pronunciation instruction beyond a controlled environment and the effectiveness of specific pedagogies to enhance global intelligibility (Darcy, 2018). This study attempts to add new knowledge in this regard.

Form-Focused Instruction

If the assumption that PI can make a difference in L2 learners’ performance in pronunciation is accepted, then the next step is to identify the teaching approaches that address learner needs. The pedagogical framework in this study is FFI (Doughty & Williams, 1998), and it is one pedagogical option to implement PI. FFI aims to draw L2 learners’ attention to the target language apparatus that learners would otherwise not notice in input or use in output during classroom communicative activities (Saito, 2012). The underlying assumption of FFI is that L2 learners are already engaged in meaning-making tasks when their attention is directed to the linguistic features that are necessary for successful, effective communication. FFI can be categorized into two types: integrated and isolated (Spada & Lightbown, 2008). Integrated FFI takes place when interventions are embedded in communication, such as corrective feedback in the form of recast, repetition, and clarification requests. This type of FFI is often reactive and incidental, though some teachers anticipate difficulty and thus proactively plan to target the preselected features through pedagogical techniques (e.g., activity design, feedback, and tone of voice) while maintaining a primary focus on meaning. On the other hand, isolated FFI refers to instruction that is delivered during non-communicative activities, such as during the preparation for a communicative activity or during an activity where students’ difficulties are identified and the teacher and the learner take time out from the conversation and modify the speech to avoid a conversation breakdown. Although isolated FFI occurs through a non-communicative use of language, it is prompted by communicative needs. FFI, implemented in communicative contexts, can be effective for one’s language development under the assumption that “learners will be able to transfer what they learn in the classroom to communicative interaction outside the classroom” (Saito, 2012, p. 596).

In summary, according to the meta-analysis of PI conducted by Saito and Plonsky (2019), PI is most effective when it aims at specific pronunciation features (e.g., segmentals, prosody, and fluency) and when instructional gains are measured through controlled tasks (e.g., word and sentence reading). While Saito and Plonsky’s findings suggested that PI can directly improve the development of an explicit, controlled, and specific aspect of L2 learners’ pronunciation proficiency, there is a lack of classroom-based tutoring studies that investigate the instructional gain measured via spontaneous tasks targeting at the workplace intelligibility proficiency resulting from instruction. This study attempts to bridge part of that gap.

Methods

Context

Malcolm was a native speaker of Mandarin and was 45 at the time of the study. He immigrated to the USA in his 20s. Over 19 years of living in the USA, Malcolm started as a chemistry graduate student and then worked as a material engineer in different companies after he received his doctoral degree at an American institution. At the time of the study, Malcolm was a polymer engineer for a global supplier of antioxidants and polymer modifiers. The majority of the company employees and clients were native speakers of English, though there were bilingual and multilingual colleagues and customers. According to Malcolm, upon completion of his doctoral degree, his advisor expressed concerns about Malcolm’s English intelligibility in relation to finding employment in the USA. In light of such concerns, Malcolm focused his job search on positions that required little verbal communication, such as a lab scientist. Malcolm’s wife was also a native speaker of Mandarin. Malcolm described his wife as an effective ESL speaker, who helped him prepare for job interviews and presentations with regard to his English usage. Malcolm added that he was aware that the pace of his speech had decreased his intelligibility because many of those he interacted with suggested that he talk slower to enhance intelligibility in both English and Mandarin. Malcolm shared that a storekeeper once told him, “Dude, slow down. I don’t understand you at all.” Although Malcolm had tried to pace his output, he often forgot about it and sped up as he continued in his conversations or presentation. Malcolm explained that he always felt he had a lot to share with others, and the urge to deliver what he intended to say resulted in an accelerated pace of speech. Moreover, although comments from others had constantly brought his unintelligibility to Malcolm’s attention, he did not understand exactly what was wrong with his pronunciation, and Malcolm often could not hear the discrepancies between his pronunciation and the norms. “They sounded similar enough to me,” said Malcolm during his meeting with the researcher. Notwithstanding, based on the researcher’s observation, Malcolm was a tireless, diligent ESL learner, who went above and beyond to complete the assignments, ask questions, and be well-prepared for all the tutoring sessions. In her interaction with Malcolm, the researcher found Malcolm intelligible in Mandarin in general. Malcolm spoke at a rushed pace, and there were a few times when the researcher needed Malcolm to repeat or clarify his intended messages.

Vivian was a rising senior majoring in Chinese Studies at a public university in the USA. She had no English tutoring experience but was an experienced L2 learner herself. In addition, Vivian was articulate and approachable; therefore, Malcolm decided to work with Vivian. Their Zoom sessions were one to three times per week for 60 min per meeting over 4 months for a total of 36 h. All the tutoring sessions were recorded.

The researcher was an associate professor of applied linguistics at a public research university in the USA. She directed the Chinese Studies Program in her institution and supervised Chinese and Japanese graduate teaching assistants during the time of the study. The researcher received her doctoral degree from a public research university in the USA and her educational background was in teacher education, second language acquisition, and teaching English to speakers of other languages (TESOL). She taught courses in teaching practicum, Asian films, and Chinese language at all levels on a regular basis. Her role in relation to Vivian’s teaching in this project was to provide pedagogical support and reflective mentorship.

Research Design

In this single-subject design, Malcolm underwent all treatment conditions and served as his own control. This design allowed the researcher to establish a stable baseline before administering the intervention (Statake et al., 2008) and study Malcolm over a period of time to determine whether the given intervention was effective in improving his speech intelligibility.

Treatment

In this study, the development of Malcolm’s pronunciation proficiency was focused on improving the global, spontaneous intelligibility in his utterances. It was also taken into consideration that formulaic, predictable phrases (e.g., I wonder why) and the use of conversational grammar (e.g., when suggesting, one can say “if I were you, I’d…”) can result in listeners’ improved understanding as their processing time of the utterance may be reduced (Derwing et al., 2021). Hence, Vivian was instructed to create ample communicative activities, incorporate workplace topics and speech acts (e.g., giving directives and offering refusals) through role-plays, pay attention to any unintelligible speech items, and try to correct those items in context and through time-out drills using varying techniques. Individual lessons were not targeted at any preemptive sounds or aspects of pronunciation, although reviews of sounds that Malcolm had previously struggled with were integrated into the lesson.

In preparing Vivian for the tutoring task, the researcher and Vivian had three 90-min orientation meetings to explain key pedagogical concepts (e.g., FFI, types of corrective feedback, elicitation techniques, PI, and student support), conduct micro-teaching, and offer post-teaching critiques. After the tutoring program started, Vivian met with the researcher on Zoom monthly to reflect on her teaching. Vivian also sent emails to the researcher whenever questions arose. After reviewing the recorded Zoom tutoring sessions periodically, the researcher and Vivian would comminate with each other on Zoom or via emails if any questions.

The 36-h FFI treatment consisted of the following two primary teaching techniques.

  1. 1.

    Functional language practice: Malcolm and Vivian were engaged in authentic, spontaneous conversations relative to Malcolm’s life, work, and current social issues in each session (e.g., commenting on the company’s COVID-19 policy and complimenting on lab skills). During this practice, Vivian was either an attentive listener who took notes or an interlocutor in role-plays. She sometimes asked follow-up questions and requested clarifications (integrated FFI). On average, this practice accounted for 40% of the instructional time based on the time logs.

  2. 2.

    Corrective feedback: After Malcolm finished a section of his intended utterances (e.g., presenting his disagreement with the quarantine mandate), Vivian took time out from the communicative activities and discussed language elements (isolated FFI), including both segmental and suprasegmental, that were identified as problematic (e.g., Malcolm pronounced “situation” as /situash/, “mask” as /maska/, and “they may not” as /theymehnoh/). Corrective feedback was often in the forms of recasts and explicit metalinguistic information about articulatory and auditory aspects of segmental and suprasegmental features (e.g., Vivian first corrected Malcolm’s pronunciation of thinking with a recast by saying “oh, you were still thinking.” and then she explained the pronunciation differences between sinking and thinking). Corrective feedback was prompted by Malcolm’s communicative needs, occurred through both communicative and noncommunicative modes of language practice, and was followed by drills. For instance, while listening to Malcolm’s description of an anecdote, Vivian asked for clarification by saying “Do you mean world, word, or war?” After Malcolm’s clarification, Vivian paused the activities, explained the pronunciation difference between those three words, and drilled the words before they went back to Malcolm’s anecdote. Following corrective feedback and repetitive practices, Malcolm could choose to move on to the next topic or circle back to try the same topic again and incorporate the feedback into his speech.

Overall, the 36-h intervention sought to improve both segmental (e.g., individual sound contrasts between /th/ and /s/) and suprasegmental (e.g., the stress of maintenance) features. The FFI systematically embraced communicative activities (e.g., discussions of the Federal Pandemic Unemployment Compensation program), drills (e.g., repetition at word and sentence levels), explicit instruction (e.g., explaining the placement of the tongue to pronounce /r/ as in round and roar), recasts (e.g., You are right that he probably did it), clarification requests (e.g., what do you mean by /pa-la-i-ˈsi-ti-ke/), and then back to meaning-oriented tasks where Malcolm could integrate the drilled phrases in communication. Studies have shown respective positive outcomes of these PI techniques in varied contexts (e.g., Park, 2000; Saito, 2012; Thomson & Derwing, 2015; Gordon & Darcy, 2016).

Measurements

This study investigated perceptible changes in Malcolm’s oral intelligibility at a macro level as a result of PI. Hence, listeners’ quick, intuitive impressionistic judgments about the sample speech that was typical of spoken tasks in a workplace were used as the scoring method. Before the study was conducted, the Institutional Review Board (IRB) reviewed the researcher’s credentials, examined the study proposal, and ensured the welfare, rights, and privacy of human subjects in the study. With the IRB approval at the researcher’s institution and consent from the participants, four measurements respectively occurred (1) before the treatment started, (2) after the first 12-h treatment, (3) after the 24-h treatment, and (4) after the 36-h treatment. Each measurement consisted of six speech samples, amounting to a total of 24, and the average length was 70 s per sample. Conversation topics were prompted by Vivian based on what Malcolm could potentially encounter in his workplace. The format of the measurements closely resembled their regular tutoring sessions. For example, in a role-play, Vivian asked Malcolm, “Why did you not get the promotion?” The four measurement sessions were recorded on Zoom. The 24 audio files were extracted, randomized, and saved on the researcher’s Dropbox for raters to access.

Data Collection

The initial pool of rater candidates included the researcher’s colleagues, students, and contacts. Then, a virtual snowball sampling procedure was administered to recruit raters with diverse backgrounds. Thirty raters were recruited based on their availability, willingness, and demographics. The ages of raters ranged from 19 to 61; four gender identifications were represented (male, female, transgender, and non-binary); both native speakers of English and native speakers of Mandarin were included; three primary categories of the raters’ employment situations were college students, working professionals, and unemployed. The raters were not compensated. Communication with the raters was through emails.

The rater candidates were instructed to read the research purposes, the operational definition of intelligibility, and the accompanying examples of how to rate speech using a six-point Likert scale that resembled the actual rating task. The rater candidates were encouraged to ask questions and had 3 days to reply to the invitation email with their acceptance or decline. After the 30 raters were identified, emails with instructions, a rating sheet, and a Dropbox link to the 24 speech samples went out on the same day. Those who consented to be raters were asked to allocate 60 min for the task, listen to each of the 24 speech samples only once, use the rating sheets provided to them to finish all 24 ratings in one setting, and send the results back to the researcher within 1 week. The rating adopted a six-point Likert scale, and for each of the 24 speech samples raters chose one of the six response options from “completely disagree” to “completely agree” to evaluate the following statement: I understood what the speaker in the recording intended to say. The definition of intelligibility was available on the first page of the rating sheet and next to each rating task as a quick reference. The definition provided was “Intelligibility refers to the extent to which the speaker’s intended utterance is actually understood by a listener.” After the ratings were collected, the researcher replaced the raters’ names with a code for anonymity, and the research assistant subsequently entered the data in an Excel spreadsheet and prepared the data for analysis using SPSS. The inter-rater reliability index, Cronbach’s Alpha, was α = 0.7, and it showed adequate agreement (Taber, 2018) across the 30 raters.

Comments from Vivian and Malcolm about Malcolm’s development in his speech intelligibility were collected during the two meetings with Malcolm at the mid and end points of the tutoring program, four monthly meetings with Vivian, and email exchanges with Vivian during the duration of the study.

Data Analysis

Qualitative data came from online meetings, email, and phone conversations. These data sets were recorded through Zoom, stored in email, or documented in field notes. The comments of Malcolm, Vivian, and the raters were organized by topic (e.g., comments on learning experience and perceptions of outcomes) to answer the research questions. The data were coded using the first and second coding cycles (Saldana, 2021) for repeated themes. Structural coding was applied during the initial round of coding to identify major topics and explanations relevant to the research questions and theoretical constructs (e.g., fossilization and L1 transfer). The second round analyzed interactions among thematic sub-datasets and synthesized them. The data were member-checked for accuracy and resonance with their experiences to enhance the trustworthiness of the results (Trochim & Donnelly, 2006). The validity of the findings was examined and increased through triangulation using different sources of information. The goal was to ensure that the findings reported were true and certain. Being true means that the findings accurately reflected the research process and outcomes, whereas being certain means that they were also supported by the evidence found in the data set (Guion et al., 2011).

Quantitively, this study included one participant (n = 1). A one-way repeated ANOVA was conducted with four measurements as independent variables (factors) and with intelligibility as dependent variables to determine whether the means of the four measurements were statistically different in response to the interventions. For the dependent variable, there were 180 data entries (n = 180) in each of the four measurements.

Results

Comments

Research question one investigated Malcolm and Vivian’s opinions of Malcolm’s development in speech intelligibility as a result of the 4-month tutoring.

Perceived Outcomes and Attributions

During the Zoom communication with Malcolm both at the midpoint and endpoint of the four-month instruction, Malcolm stated that his supervisor and colleagues shared with him that they became better able to understand the information Malcolm intended to convey in small talk and meetings. “My boss told me I sound better,” said Malcolm. Emails between the supervisor and Vivian also confirmed the perceived improvement. The supervisor documented Malcolm’s improvement, and the company decided to continue to invest in Malcolm’s learning with Vivian past the initial 36 h. During both meetings between the researcher and Malcolm, he expressed his satisfaction with the learning experience, outcomes, and increased confidence in speaking with colleagues and clients, as shown in the following excerpt. “I feel very happy. When I pick up the phone, I feel much more confident talking with our clients. I enjoy working with Vivian. She is very patient and friendly. I want to continue with Vivian, and the company said OK.” On the other hand, Malcolm continued to be aware that a good portion of his speech remained unintelligible and his ability to learn ESL might have been affected by his age, first language, and the fact that the pronunciation issues were not addressed until decades after he first started learning English. Malcolm indicated the following concerns.

My English teachers did not correct my pronunciation when I was in middle school. Now, I am just way too old. I regret and fear. A professor in graduate school mentioned that my English problems might have been fossilized. This is making me scared. I feel angry that I was not provided the necessary instructions on how to speak English when I was younger, but I will keep trying.

Learning Strategies During the Study

In explaining his efforts, Malcolm shared that in addition to the tutoring sessions, he regularly used a speech recognition app designed to give feedback on learners’ English pronunciation to practice the speaking exercises that Vivian assigned, such as linking consonants to vowels (e.g., some of) and hearing differences (e.g., /i/ and /ɪ/ as in eat and it). Malcolm also watched YouTube teaching videos related to ESL pronunciation and used varying resources in the hope of analyzing the nature of his pronunciation problems and improving his speech intelligibility. Moreover, according to Malcolm, he was motivated to improve and mindful of the mistakes that Vivian provided feedback on. Hence, Malcolm practiced pronunciation repeatedly inside and outside the tutoring sessions with the assistance of his notes and recorded samples from Vivian for input training, in which he was exposed to multiple repeated instances of sounds to enhance his phonetic perception and production. Malcolm reported as follows:

I keep listening to and repeating with the recordings every night. I am focused during the tutoring sessions. I am sure I have improved. My colleagues also told me that I am improving, but they sometimes still need me to repeat myself before they can sort of understand me.

In commenting on his favorite class exercises, Malcolm shared that he enjoyed engaging in conversations about his work life and discussions of current social issues with Vivian because “I watch American news every day, and I want to discuss policies and social problems to communicate my opinions, but I often worry that I am not understandable.” He also found role-plays helpful because he could practice pronunciation (e.g., run) with formulaic work-related expressions, such as using might when giving tentative advice as in “you might want to run it by your supervisor first.” These opportunities allowed Malcolm to organize his thoughts “in an American way” and use the target pronunciation items that he had recently learned from Vivian. Malcolm noted that the communicative components of the tutoring sessions were meaningful and effective in helping him express himself confidently with improved pronunciation, as revealed in the following excerpt. “During the role plays, I pretend that Vivian is my colleague. After I rehearsed the conversations with her, I often feel more confident the next day when I talk about similar content using better pronunciation with my real colleagues.” Furthermore, Malcolm stressed that “I am a visual learner” and the best teaching practice for him was when Vivian presented how to pronounce words (e.g., where to put the tongue) through illustrations (e.g., images), demonstrations (e.g., Vivian exaggerated the mouth movement), and corrective feedback with explanations (e.g., Vivian told him precisely what he did wrong). Malcolm emphasized that “I feel more confident to pronounce the target sounds only after I knew what exactly I should do or not to do.”

Perceived Fast-Paced Speech

Malcolm’s fast-paced speech was a recurring topic in the comments from Vivian, the supervisor, Malcolm, and some raters. His speech in both Mandarin and English sounded hurried to the researcher and Vivian. Regarding the possible reasons why Malcolm sounded unintelligible to her, Vivian stated the following:

Malcolm speaks English so fast that I don’t think I could hear individual words. It feels like he slurs most of the time, and all the sounds just run into one another. However, when he remembers to slow down and enunciates every word even when he does not pronounce all the words accurately, I usually can understand him the first time. Although I remind him constantly that he needs to give time to individual words, he often forgets. I think the problem here is the combination of the fast pace and enunciation. Malcolm really needs to slow down and get every word in, and then he can speed up when he is ready.

The recorded sessions support Vivian’s comment and indicate that Malcolm usually did not finish individual words or syllables before he moved on to the next. For example, instead of saying, “I didn’t agree with that,” Malcolm rapidly said /aɪ dɪdn̩ əɡɹ wɪ ðæ/. Malcolm shared that he made some improvement in his speaking pace thanks to Vivian’s hand gestures and facial expressions to remind him to speak slowly and finish pronouncing the entirety of the word. Malcolm described that “When Vivian widened her eyes as I was about to say ‘didn’t,’ I knew I needed to take my time and finish the entire word by including the final phoneme /t/ in / dɪdn̩t/.” When asked why he often dropped the final phoneme (e.g., /p/ in /hɛlp/ and /k/ in /straɪk/), Malcolm offered the following answer:

In Mandarin, there is just one sound for one word. I just didn’t think it matters in English since the listener already heard the front part of the word in context so I think people could just figure out the rest. Also, speaking English fluently just like how fast I can speak Mandarin makes me feel good about my ability. Moreover, I have a lot to express so dropping the ending sound saves me time and allows me to glide through. I have always talked fast and can’t enunciate well in both Mandarin and English, and old habits die hard.

When responding to what worked well for his learning, Malcolm commented “It is helpful when Vivian explains the differences between English and Mandarin. Also, I need to witness the pronunciations that Vivian has taught me actually happen in real conversations.” Malcolm had difficulty pronouncing the /θ/ sound as in things and health. Although Vivian explained how to place the tongue between the teeth to create the right sound, Malcolm was resistant to the idea of sticking his tongue outside his mouth. “It is just rude to show people your tongue,” said Malcolm. However, after he started noticing how his supervisor stuck out his tongue when he pronounced the /θ/ sound, he began following Vivian’s instruction to pronounce the sound correctly. “I thought to myself if the supervisor is not embarrassed in showing his tongue in the company, I will not be shy about it anymore,” Malcolm reflected. Moreover, /θ/ was an ending phoneme that Malcolm was able to add back in his utterance earlier than other sounds (e.g., /k/ in /plæstɪk/). When explaining why /θ/ seems easier for him to remember as a final phoneme, Malcolm noted that:

This sound is unique to me because we don’t have it in Mandarin. So it draws my attention, and I deliberately look for it and try to pronounce it hard whenever I have a chance regardless of its position in the word. Let me show you: THankfully, paTHology, and fifTH.

On the other hand, Malcolm added that he had a hard time when Vivian tried to engage him in listen-and-repeat practice because he believed that his age prevented him from having the necessary perception of sounds, which subsequently impaired his ability to repeat after Vivian, as indicated in the following excerpt. “Sadly, I am like an old dog who can’t learn new tricks. I also can’t hear the differences between Vivian and myself. I won’t give up, but I know I am limited.”

The observations of the recorded tutoring sessions showed that even when Malcolm pronounced individual words intelligibly during drills or clarification requests, he often struggled to maintain the same level of intelligibility when those words were used again in communication. Responding to these observations, Malcolm commented that “I couldn’t remember which pronunciation that I needed to pay attention to when all I could think about was the content.” When asked if there was any approach that could help with the situation, Malcolm said, “Vivian told me to drill myself repeatedly to develop muscle memory, and I think it is working for me because for some sounds that I used to struggle with, I can now pronounce them naturally without being intentional.”

Reflections on the Tutoring Experience

FFI on pronunciation was new to Vivian, as it probably would have been for many instructors due to a general lack of PI training and classroom experience in L2 teacher development (Foote et al., 2011; Lord & Fionda, 2014). In offering an overview of her teaching in the monthly check-in meetings, Vivian stated that Malcolm made noticeable improvement based on her more frequent ability to understand Malcolm’s speech on his first attempt, although there were times she puzzled over Malcolm’s intended meaning and wondered how Malcolm’s linguistic and learning background contributed to the outcomes. Among the instructional challenges, Vivian highlighted the extensive time and practice required for Malcolm’s improvements to take root, as revealed in the following comment. “It is just taking so long for anything to stick. Sometimes we need to review the materials as if we had never learned them before. Malcolm works really hard, so perhaps pronunciation work by nature is very difficult for both ESL learners and instructors.” For example, Vivian had a hard time helping Malcolm with the /ɚ/ sound, as in dinner, and the /əʊ/ sound, as in road. When she struggled pedagogically, Vivian read empirical research articles and pedagogical blogs written by ESL teachers on related topics. She also watched relevant videos on YouTube and discussed her thoughts with other course professors to explore varying teaching techniques and enhance her teaching capacity. Vivian reflected on her teaching constantly and applied different approaches in her use of FFI, such as sounds and spelling activities, use of phonemic symbols, and awareness raising using body movements. Generally, Vivian thought the use of the International Phonetic Alphabet (IPA) was helpful. Nevertheless, Vivian shared that there seemed inconsistency in her teaching outcomes as she noticed different activities helped with different problems (e.g., stress and longer words) on different days with varying success. Vivian stressed that her pedagogies needed to be responsive to Malcolm’s learning needs of the moment, including his frustration when the practice did not produce the desired outcome. Among strategies to manage Malcolm’s negative emotions, Vivian suggested that while one-on-one tutoring offered tailored instruction that Malcolm would not have had otherwise, adding a group-learning component could be beneficial so that “when the practice, either communicative or mechanical, is going nowhere after a while, we could temporarily shift our attention to other learners and ease the frustration.”

In attributing the instructional challenges, Vivian thought Malcolm’s inability to distinguish the sounds that he produced played a role. According to Vivian, although Malcolm could hear the difference between, for example, wrong and run when Vivian pronounced them in contrast, he was unable to hear himself and thus did not know he made the mistake when he said, “It is wrong to wrong in the hallway.” Vivian added that to assist Malcolm in identifying the mispronunciation in his output, she would mimic how he had said it in comparison to the correct pronunciation. Only then did Malcolm realize the difference between his production and the target form. “I don’t hear the distinctive difference between right and light when I said them unless Vivian explicitly tells me what to look for,” said Malcolm. Both Malcolm and Vivian thought the teaching method that combined imitation and explanation was generally helpful in improving Malcolm’s perceptive skills. Moreover, Vivian and Malcolm both mentioned the idea of Malcolm’s pronunciation patterns being fossilized when explaining why altering the way Malcolm spoke was effortful. Malcolm asked the researcher that “So the problems in my English will most likely never go away, right? Are they permanent?” On the same topic, Vivian tried to verify her speculation: “I read about fossilization in language learning. I wonder if that is what Malcolm and I are experiencing.” Furthermore, Vivian wondered if her lack of experience in teaching ESL pronunciation contributed to the outcome where her efforts did not consistently achieve the desired goals, as sometimes the same approach worked in this session but not in others. Her lack of PI experience affected her confidence when teaching pronunciation, although she had taken a few linguistics and English-teaching classes at her university and continued to enrich her knowledge in this regard through self-study. Vivian reflected that “I am aware that I am inexperienced, but I’ve researched ways to help Malcolm and included a variety of opportunities for him to improve. However, I can’t help but wonder if Malcolm would show greater improvement if he had a pronunciation professor or researcher as his tutor.”

Quantitative Results

Research question two investigated if Malcolm improved his workplace spontaneous speech intelligibility in repones to FFI based on listeners’ ratings The assumption of sphericity was met according to Mauchly’s test (p > 0.05). Table 1 shows the rating means of intelligibility in four measurements.

Table 1 Descriptive statistics: intelligibility

An ANOVA result shows no significant main effect of time, F (3, 180) = 2.14, p > 0.05, and η2 = 0.012. This means there was no statistical difference in the four measurements of intelligibility as a result of the 36-h instruction.

The raters were invited to offer optional open commentary in relation to what caused unintelligibility. Based on the comments received, both segmental (e.g., inarticulation of p and c initial words, such as polymer and concentrate) and suprasegmental (e.g., stress and coarticulation at word junctures) characteristics in Malcolm’s speech affected how much the raters were able to understand him. Malcolm’s fast-paced speech, frequent fillers (e.g., yeah and okay), repetitions of words, and omission of final sounds of words (e.g., engineer was pronounced as enginee and sounded like an Jeannie) also challenged the raters in fully understanding the intended messages. In addition, grammatical errors affected Malcolm’s speech clarity and subsequently the extent to which the raters could understand the intended messages. Moreover, most raters who offered feedback noted their level of understanding was impacted by Malcolm’s accent. Nevertheless, two raters who identified themselves as ESL instructors commented that they understood Malcolm well, but they would be unable to transcribe the speech verbatim because they primarily relied on keywords and context.

Discussion and Pedagogical Implications

This study investigated the instructional potential of PI on an adult ESL user’s workplace spontaneous speech intelligibility through the analyses of comments from the learner and the instructor as well as subjective listeners’ impressionistic ratings. In the view of naturalistic L2 acquisition of pronunciation, the age of learning an L2 is associated with ultimate attainment (Derwing et al, 2014), and studies show that the first year of massive exposure to the L2 contributed the most phonetic development (Flege, 1988; Munro & Derwing, 2008). Following this line, Malcolm’s age and L2 learning experience might have played a significant role in his overall speech intelligibility development. Nonetheless, explicit interventions have shown effectiveness with adult learners with fossilized speech patterns (e.g., Derwing et al, 2014), and PI can be most effective in a controlled, clinical instructional environment with a limited number of target forms (e.g., Ruellot, 2011; Saito, 2011). This study, on the other hand, investigated if learners’ intelligibility in their spontaneous output can be enhanced through FFI so learners are communicatively intelligible in their workplace interactions. Improvement in global intelligibility in response to PI has been found in studies (e.g., Ruellot, 2011; Gordon & Darcy, 2016; Saito, 2012), and a lack of improvement at a spontaneous level has also been evidenced in other studies (e.g., Elliott, 1997; Macdonald et al., 1994a, 1994b; Saito, 2011). Depending on the PI structure and other factors, studies have yielded varying results. The results in this study showed no statistically significant changes in Malcolm's intelligibility among the four measurements, although the instructional plan was created based on theory, empirical research, and Malcolm’s perceptions of his learning needs. The findings seem to suggest a gap between the theory and the reality of classroom pedagogy. Moreover, the statistical results did not align with the comments from Malcolm, his supervisor, colleagues, or Vivian as they all perceived Malcolm’s improvement. This gap may have resulted from measurements of intelligibility and challenges in teaching pronunciation. The time investment and teachers’ experience with PI are two obstacles to classroom PI (Darcy, 2018), and these two challenges were observed in this study and thus will be discussed along with listener characteristics in relation to the measurements, speaking rate, and L1 transfer.

More Time Needed

Although improvement was not captured statistically in this study, based on the perceptions of Malcolm and those who interacted with him on a regular basis, it is possible that Malcolm in fact made progress in some aspects of his speech that improved intelligibility (e.g., he became able to pronounce /r/ intelligibly most of the time in contrast with /l/). However, sporadic developments may have needed time to manifest an outcome in this study’s measurements that examined global intelligibility. If improvement in particular features does not necessarily enhance learners’ global L2 pronunciation proficiency (Saito et al., 2017), as appears to be true in Malcolm’s case, how does this inform classroom pronunciation pedagogy? Further research is needed to clarify these questions.

The possibility that more time was most likely necessary for the FFI to impact Malcolm’s unconfined, spontaneous speech intelligibility is supported by the view that the effect of PI on “real-life” speech is typically slow and gradual and difficult to measure (Trofimovich & Isaacs, 2017). If spontaneous intelligibility takes more than 36 h, which is close to the amount of time of a semester class, to show statistically significant improvement, how does this inform ESL curricula, instructors’ lesson planning, and ESL learners' expectations? Only a small number of workplace ESL programs are available in the USA (Derwing et al., 2014), and this is partial because employers misjudge the amount of time necessary to learn an L2, misunderstand the learning outcomes, and become unable to see the value of instruction (Burt, 2004). According to Derwing et al. (2014), beyond the challenge concerning the scarcity of workplace ESL curricula, ESL learners’ pronunciation difficulties are not typically dealt with as part of the standard classroom practice but are addressed individually or incidentally if addressed at all. As Vivian reflected, learners would benefit more if a peer-learning component is integrated. This would reduce learners’ frustration with falling short of target outcomes. Such frustration can be eased when learners are not always in the spotlight as during the one-on-one sessions. The results of this study shed light on PI in reality, and the findings offer data for employers to consider their expectations and for workplace ESL programs to set attainable goals.

In addressing why contextualized or decontextualized drills do not always work as a prerequisite before L2 learners can express themselves communicatively from a theoretical perspective, Wong and VanPatten (2003) stated, “learners bring internal mechanisms to the task of acquisition that cannot be manipulated by explicit instruction” (p.407) and “we can teach whatever we want to, but only the brain is responsible for learning, and it has its own devices” (p.408). Nevertheless, studies have shown measurable benefits for autonomous L2 production in response to mindful repetitive practices (e.g., Everly, 2019; Trofimovich & Gatbonton, 2006). Nunan (1999) also noted that “drills are an essential ingredient in the learning process, and provide the enabling skills for later communicative performance.” (p.76). Moreover, the findings of Hassanzadeh and Salehizadeh’s study (2020) showed that PI with attention to form benefited L2 learners’ development in phonological competence. Among FFI options, according to Hassanzadeh and Salehizadeh, the output-oriented treatment met with more overall success than did the input-enhancement or explicit-corrective-feedback groups. Their findings supported Swain’s output hypothesis that while comprehensive input is necessary for fostering L2 learners’ communicative skills, as far as phonology is concerned, learners need to produce output actively to “consolidate what they commit to memory” (p. 10), which was part of the treatment in the study. These views combined with the consideration of the level of Vivian’s expertise in PI can contextualize the findings that the FFI elements in the study can potentially improve Malcolm’s intelligibility and can be more effective as the instructor’s teaching experience grows, although Malcolm’s internal device may determine when the acquisition will happen. That is, while an instructional effect was not found within the 4-month period, improvement can occur later as tutoring continues. This view leads to a pedagogical implication that encourages ESL programs intended to prepare learners for the workplace to establish baseline achievement data for spontaneous intelligibility. The workplace ESL programs should periodically measure learners’ progress towards the goal against benchmarks that account for challenges particular to PI and offer realistic expectations for pronunciation improvement in adult learners. Consistent and meaningful assessments of learners’ pronunciation can also foster a learning culture where teachers and learners are more willing to spend time on it and prioritize PI.

Instructional Time Allocation

The time factor also concerns the time allocation in each instructional session. It is commonly accepted that higher levels of accurate pronunciation patterns contribute to speech intelligibility (Munro & Derwing, 1995). Hence, roughly 40% of the instructional hours were used for role-plays and conversations, and 60% was devoted to developing more intelligible pronunciation patterns, which typically feature individual phonetic segments and appropriate placement of prominence, word stress, syllable timing, and pauses (Gordon & Darcy, 2016). Vivian’s selection of pronunciation items for additional practice was incidental and based on whether she had perceived them as problematic during the preceding communicative task. This approach was organic as it reflected Vivian’s natural response as a listener who asked for clarification. However, considering the wide spectrum of intelligibility attributes in pronunciation listed above, the great variety of workplace conversation topics ranging from Malcolm’s small talks with colleagues to his experiments in the lab, and the limited instructional time available, perhaps a portion of the instructional hours should be purposefully spent on phonemic errors that carry a higher functional load and are more likely affect listener’s comprehension (Brown, 1991) in the future instruction. Specifically, Vivian and Malcolm spent a considerable amount of time correcting the th sound, as in thanks. Malcolm’s final success in mastering the th sound was encouraging. However, when Malcolm substituted thanks for danks or sanks, his speech intelligibility was unlikely affected. In contrast, the pronunciation of rice as lice was likely to be more problematic for listeners. If Vivian had been prepared to reflect the hierarchy of phonemic errors in her teaching, she might have been able to allocate the instructional time differently and center the practice on pronunciations with higher functional loads in order to gain intelligibility more effectively within the time frame. Moreover, Malcolm’s struggles with his perception/listening skills should receive more instructional attention in addition to his production practice going forward.

Instructional Intervention

The observation of the 36 tutoring sessions showed that Malcolm struggled with his applications of the previously practiced individual pronunciation items. That is, Malcolm was unable to maintain his pronunciation performance during the follow-up conversational activities as well as how he did at the word or phrase levels in drills. Malcolm had similar challenges in the four measurements, in which the prompts engaged him in communicative tasks and required him to focus on meaning and produce output beyond individual words. These observations suggest that maintaining the same level of speech intelligibility from decontextualized practice to communication was challenging for Malcolm despite constant practice. Such transferability of intelligibility was noted by Saito (2012) and Darcy (2018) and led to questions concerning whether steps between drills at the word and phrase levels (e.g., plastics and then the plastics machinery) and Malcolm’s return to communicative tasks (e.g., describe again what happened to the plastics machinery) were missing, as only a portion of the corrected pronunciation successfully migrated into subsequent meaning-oriented output. If more practice is needed to achieve a higher rate of transferability, briefer drill-communication hybrid activities can be inserted to transition learners into full-fledged authentic applications.

This study was not set up with the assumption that Malcolm’s speech patterns were fossilized or stabilized. To the contrary, why Malcolm’s speech was unintelligible to his colleagues or why improving his pronunciation had been arduous for Malcolm was unclear when the study began. Hence, the pedagogical intervention was not initially designed or implemented with the consideration that Malcolm’s L2 pronunciation might have fossilized. Malcolm’s stabilized speech patterns were observed during the latter part of the study. Towards the end of 36 instructional hours, after Malcolm and Vivian were met with constant challenges, they both started to wonder if Malcolm’s speech patterns had stabilized, as shown in the excerpts in 4.11 and 4.14. Taking speech stabilization and the association between fossilization and L1 transfer into consideration, Vivian began to offer Malcolm additional explicit explanations incidentally about the segmental and suprasegmental differences between Mandarin and English in an attempt to destabilize Malcolm’s pronunciation patterns, as suggested in research (e.g., Derwing et al., 2014; Han, 2013; Xiaorong & Jian, 2011). Although Malcolm found such linguistic knowledge helpful, this teaching approach did not increase Malcolm’s intelligibility statistically during the study. If fossilization was associated with Malcolm’s difficulty in improving his spontaneous intelligibility, then the lack of statistically significant improvement in this study could be ascribed to the intervention, which did not specifically target fossilized pronunciation patterns. However, what are effective classroom pedagogical techniques that increase global intelligibility of L2 learners whose L2s are considered fossilized? More classroom research is needed to support practitioners in the field.

Teacher Preparation

Vivian was not a professional instructor for ESL pronunciation; therefore, her little confidence and unfamiliarity with functional loads and other pronunciation-specific theories and pedagogies were expected. This situation presents a pedagogical implication that professional development specifically in PI is necessary so instructors can deliver lessons more confidently with less uncertainty about how exactly to teach pronunciation in an engaging, communicative way. For example, beyond a theoretical introduction of functional loads, how do teacher education curricula practically prepare instructors to prioritize pronunciation issues, choose the corresponding teaching methods, reframe “boring” drills so their pedagogical values can be perceived, and meet diverse learners’ pronunciation needs? Darcy (2018) stated that ESL teachers’ lack of specific theoretical or practical training in PI compounded by insufficient teaching resources results in low confidence and efficacy in teachers’ delivery of PI in reality. As such, both training and adequate teaching materials should be made available to instructors of pronunciation.

Vivian’s inexperience in PI might have contributed to the study’s findings, although it is worth noting that Malcolm received 36 h of instruction regardless and that Malcolm acknowledged a number of good instructional attributes that Vivian demonstrated, such as teachers’ knowledge, professional attitude, classroom performance, rapport establishment, student motivation, and personality charm (Gao & Liu, 2012). Nevertheless, future research is necessary to illuminate how instructors’ PI experiences affect their pedagogical decisions and students’ learning productivity and what preparation ESL instructors need in order to implement PI effectively.

Listener Characteristics

Although accentedness and intelligibility are two different constructs, research has shown that perceived accentedness can interfere with perceived intelligibility and that listeners tend to downgrade non-native speakers simply because of a foreign accent (e.g., Munro & Derwing, 1995). In this study, Malcolm’s foreign accent was perceived by the raters based on their comments. Hence, it is possible that accent perception influenced the raters’ judgments on intelligibility and diminished gains in their ratings. Nevertheless, two raters who had experience interacting with ESL speakers and assessing their language skills reported that despite the accent, they understood Malcolm well based on the context and keywords, though they did not think they could accurately transcribe Malcolm’s 24 speeches in the measurements. This observation is consistent with the finding in the study of Suenobu et al. (1992), which suggests listeners’ experience plays a role in their rating of L2 pronunciation. Furthermore, while the statistical results show no change in Malcolm’s intelligibility, Vivian and those who interacted with Malcolm at work perceived the gain in this regard. The contradictory discrepancy might have stemmed from their bias because they knew Malcolm was receiving instruction. Familiarity with Malcolm’s speech that those who frequently interacted with Malcolm had developed and their being more accommodating than strangers could also explain why Vivian and his colleagues found Malcolm more intelligible as time went on. On the other hand, raters’ unfamiliarity with Malcolm’s speech features might have affected their rating in a less favorable manner. The contradictory results between the perceived improvement from those who regularly interacted with Malcolm and the lack of statistical change in intelligibility reflected in the ratings might be ascribed to the level of familiarity these individuals had with Malcolm’s speech. This possibility indicates a place of the listener’s familiarity with an L2 speaker in the listener’s comprehension. The pedagogical implication of this observation suggests that instructors familiarize ESL learners with speeches from different individuals with varying speech patterns in various contexts in an effort to enhance learners’ listening comprehension.

Speaking Rate

One recurring comment shared by most of the listeners in the study is that Malcolm spoke too fast. When those interlocutors had trouble understanding Malcolm, their first reaction was to request him to slow down. Such a spontaneous reaction suggests that the listeners intuitively ascribed their inability to understand Malcolm to his speaking rate. Although speaking rate alone does not explain intelligibility, it is worth a closer look. Studies show that speaking rate can affect perceived intelligibility. Llurda (2000) found intelligibility is not independent of proficiency, and fast speech diminishes intelligibility. Anderson-Hsieh and Koehler (1988) stated that speech at a fast rate is significantly more difficult for both native and non-native speakers of English to comprehend than speech at a normal rate. Their study further found that speaking rate plays a more critical role in listeners’ understanding of the speaker’s intended message than a heavy accent. Munro and Derwing (2001) indicated that speaking at a lower or higher rate can both be disadvantageous to intelligibility, and their study showed that listeners correlated Mandarin-accented English spoken at a rate slower than native English speech with increased intelligibility and compressibility. The study of Anderson-Hsieh also reported that a professor “was able to understand the halting English of a recently arrived Chinese advisee better than he could understand his speech a year later when he was speaking more fluently and rapidly” (p. 561). These studies support the reported difficulties in understanding Malcolm associated with his speech rate.

In addition to the effect on intelligibility, the study of Llurda further revealed that excessively fast speech can also cause “a certain downgrading in the ratings of such traits as ‘well-educated’, ‘intelligent’, ‘leadership ability’, and ‘likable.’” Nevertheless, speaking rates can be associated with improved fluency for L2 learners (Lennon, 1990), and, as such, faster speech can sometimes be interpreted as a sign of L2 speakers becoming more native-like (Llurda, 2000). This view can explain part of the reason why Malcolm was inclined to maintain his speaking pace. The following quote shows that Malcolm regarded “fluency” as a temporal sequence of words and further correlated such a speaking rate with proficiency. “I feel good and skillful when I speak fast and fluently, and when I slow down, my speech doesn’t seem advanced anymore.”

Malcolm’s speech rate was constantly perceived as an obstacle to his intelligibility by his listeners, but it is critical to note that listener understanding is associated with varying dimensions of L2 speech, such as vowels and consonants (Munro & Derwing, 2006), speech rate (Derwing et al., 2008), stress (Field, 2005), and grammatical accuracy (Varonis & Gass, 1982). Therefore, a variety of linguistic variables need to be addressed perhaps concurrently in future instruction in order for Malcolm to increase his speech intelligibility.

L1 Transfer and Fossilization

According to Selinker (1972), most fossilized or stabilized linguistic L2 items relate to L1 transfer. As to how fossilization is viewed in relation to L2 acquisition, Long (2003) suggested that fossilization is a process that needs explanation rather than a cause or a product. In this view, Malcolm’s speech stabilization or fossilization should be explored rather than ascribing his lack of statistical improvement to fossilization. The researcher takes this perspective for two reasons. First, according to Long, proving an L2 learner’s speech has permanently deviated from the target language norms is methodologically challenging. Second, studies have shown PI can improve stabilized or fossilized L2 speech (e.g., Couper, 2003; Derwing et al., 2014, 2021). As such, it is pedagogically more constructive to study, through the lens of language transfer, why certain Malcolm’s L2 linguistic characteristics did not seem to improve and how comparative analysis can offer insights on the ways intervention can help Malcolm’s intelligibility rather than concluding those features are simply fossilized.

Challenges in pronunciation for Mandarin-native ESL learners lie across segmental and suprasegmental aspects (Han, 2013). Concerning segmental aspects, Xiaorong and Jian (2011) reported that phonetic negative transfer can be easily observed in /θ/, /ð/, and /r/ in the ESL speech of Mandarin-native learners, and such L1 transfer is associated with a lack of their similar counterparts in Mandarin. As a result, it is customary for Mandarin-native speakers to replace these three sounds respectively with /s/, /l/, and Mandarin /r/, which were seen in Malcolm’s speech (e.g., fourth was pronounced as /fɔːs/; mother was pronounced as /mʌlə/; roller was pronounced similarly to / ləʊlə /. In addition, a consonant in Mandarin is always followed by a vowel (except ɲ and ŋ), whereas a vowel is not inserted between English consonant clusters (e.g., /pl/ in plagiarism), and an English word ending in consonants is not followed by additional vowels (e.g., /kt/ in deduct). This difference between Mandarin and English can explain, for example, why Malcolm consistently pronounced plastic as /ˈpəlæstɪkə/. Consequently, Vivian was unable to comprehend what /ˈpəlæstɪkə/ was even in context during their earlier tutoring sessions. Furthermore, although English and Mandarin share similar phonemes /i/ and /ɪ/, Mandarin only has one /i/, and the Mandarin /i/ is longer than English /ɪ/ but shorter than English /i/. A lack of differentiation between /i/ and /ɪ/ in Malcolm’s L1 can explain why Malcolm used /i/ and /ɪ/ interchangeably; for instance, Malcolm’s it sounded like eat and leave sounds similar to live.

Suprasegmental aspects of pronunciation concern rhythm, stress, and intonation. To articulate an English sentence, the amount of time to complete the sentence is primarily decided by the number of stressed syllables in the sentence, whereas the total number of syllables determines the amount of time for completion of a sentence in Mandarin, and Mandarin speakers devote equal time to each syllable (Han, 2013). This feature of Malcolm’s L1 can shed light on why he often cut off the ending sound when speaking English. Malcolm might have tried to speak in a syllable-timed rhythm, which was the manner of how he spoke Mandarin. Moreover, Mandarin is a tonal language, and a high-low pitch pattern is attached to a morpheme permanently. On the other hand, English is a stress accent language and marks a stressed syllable by prolonging the vowel in the syllable. It is challenging for most Mandarin speakers to mark the stress in multisyllabic words (e.g., mathematical and laboratory). Han maintained that even with a learned word, Mandarin learners of English can still feel uncertain about where to mark the stress. The sharp distinction between Mandarin’s tones and English’s stress can illuminate why Malcolm had trouble distinguishing between /prəˈdjuːs/ as in to produce a new product and /ˈprədjuːs/ as in daily produce.

While both Malcolm and Vivian recognized that Malcolm’s L1 could be a source of his learning difficulties, it was not clear to them how the challenges could be overcome. During the tutoring sessions, these aforementioned research-based analyses were not readily available to them. As a result, Malcolm and Vivian relied on their “folk linguistics” and intuitive knowledge to make spontaneous ESL teaching and learning judgments. For instance, Vivian believed that Malcolm’s L1 prevented him from hearing the nuances between /i/ and /ɪ/ or between /prəˈdjuːs/ and /ˈprədjuːs/. Vivian inferred that Malcolm’s inability to hear the differences resulted in his unintelligible pronunciation. Consequently, her folk linguistics knowledge led Vivian to demonstrate the correct pronunciation repeatedly to help Malcolm hear the difference in the sounds before attempting to pronounce them correctly. Malcolm agreed that L1 transfer played a role in his inability to distinguish between some sounds, and he believed that listening more to Vivian’s pronunciation was generally beneficial. However, since eat did not differ from it to his ears during the initial period of their tutoring efforts, and nor did /prəˈdjuːs/ to /ˈprədjuːs/, Malcolm was unsure of which morpheme or stressed syllable he needed to pay attention to in Vivian’s related input. As a result, Malcolm described those of his imitations of Vivian’s pronunciation as “aimless.” Malcolm’s and Vivian’s views on L1 transfer being the source of learning difficulties and their attempts to address the difficulties through teaching and learning actions invite a question that language tutoring centers can consider in their training of ESL and world language tutors: How can professional training for language tutors start from novice tutors’ folk linguistics knowledge, move into research-based linguistic analyses, and introduce research-informed teaching practice?

The contrast analysis of phonological systems between Mandarin and English aimed to explain how Malcolm’s L1 may have negatively influenced his L2 performance, contributed to stabilizing some of his pronunciation issues, and made his speech unintelligible for some of his listeners. Future PI directed at learners like Malcolm can consider these linguistic phenomena, and instructors can explain the differences explicitly and address these areas through activities that involve both drills (e.g., Haste makes waste) and communicative exercises (Han, 2013; Kelly, 2000). However, as this study shows, while explicit explanations were appreciated by Malcolm, evidence of its effect was not available. More studies illustrating the implementation in authentic teaching settings will be resourceful for PI instructors. Moreover, Vivian and Malcolm both wondered if his pronunciation would be more intelligible had the L1 negative transfer been addressed during Malcolm’s early exposure to English. Their concerns lead to a question that involves available educational resources in a broader view: Are school English as a foreign language (EFL) teachers who are not native speakers of English, have not received sufficient training in PI, and have more than thirty students in one class prepared to help EFL learners like Malcolm to meet the learning goal of pronunciation?

Conclusions

This study was motivated by the real-life struggles of Malcolm in his workplace ESL spontaneous intelligibility. To improve the situation, Malcolm’s company invested in the one-on-one stand-alone PI that was focused on workplace vocabulary and exchanges. Malcolm, Vivian, and Malcolm’s supervisor and colleagues perceived increased intelligibility in his speech, although the initial 36-h FFI did not produce a statistically significant change in Malcolm’s spontaneous intelligibility based on 30 raters’ impressionistic judgments. Nonetheless, the findings reported some preliminary information regarding an ESL adult learner’s response to an FFI approach targeting spontaneous intelligibility. Malcolm’s age, his extensive amount of ESL experience first as a graduate student and then as an employee in the USA, and his need for spontaneous workplace intelligibility were able to contribute to the significance of the study. The findings can offer a window for novice ESL teachers, employers, learners, and program administrators to view the development of an adult ESL learner’s workplace speech intelligibility. However, the research design with one participant bears limitations in its external validity and generalizability, though a lack of external validity may be corrected by replication of the study. Readers are advised to consider the limitations as they interpret the results.

Pronunciation transferability from more controlled drills to a spontaneous context was not consistently observed in the study, which aligns with Darcy’s (2018) findings but is not in agreement with Saito (2012) study. Recognizing the instructional relevance of explicit instruction, input, repeated practice, communicative output, and corrective feedback, the FFI used in this study integrated these pedagogical elements. Regardless, the quantitative results in this study were unable to show evidence that such an instructional arrangement was effective. Despite the statistical results, Vivian, Malcolm, his supervisor, and colleagues perceived a positive change in his spontaneous intelligibility, and the observations of the recorded instructional sessions and measurements also revealed sporadic improvement in some aspects of Malcolm’s global intelligibility. This discrepancy suggests a pedagogical implication for classroom teachers in terms of language assessment: How can pronunciation in relation to global intelligibility be assessed? How do the assessment results inform the teaching? In addition, fossilization and L1 transfer were perceived by both Malcolm and Vivian and thus were discussed in an attempt to understand how this perspective can be integrated into future instruction to help learners with similar challenges. Moreover, language learning is time-dependent (Everly, 2019); hence, the time factor may have contributed to the study results. Furthermore, the instructor’s level of pedagogical skillfulness in implementing the FFI or PI, in general, might have played a role in the study findings. However, novice ESL teachers should not be discouraged; instead, they can prepare themselves accordingly knowing challenges are expected and seek professional support.

The lack of evidence for the instructional effectiveness in this study, however, does not necessarily indicate that PI is not helpful. To the contrary, Derwing et al. (2014) noted that adult ESL learners’ speech patterns can be deeply ingrained, but their seemly fossilized productions can be destabilized with explicit instruction. Hence, more studies based on authentic teaching situations are needed to research what specific pedagogical arrangements can improve L2 learners’ global intelligibility, how many instructional hours are required for improvement to take root so employers can make informed decisions when investing in workplace ESL programs, and how instructors can be prepared for PI. The answers can inform ESL workplace programs’ curricula and teacher development, suggest realistic expectations for ESL learners and instructors, and help bridge theory and classroom realities.