Formant Frequencies and Vowel Space Area in Javanese and Sundanese English Language Learners

Several studies have documented how first language (L1) vowel systems play an important role in the vowel production of a second language (L2). L2 learners of Western languages who exhibit a smaller L1 vowel system are predicted to struggle with producing L2 vowels. However, there remains a paucity of evidence on how the L1 vowel system of non-Western languages interferes with L2 vowel production In this case, the focus is specifically on Javanese and Sundanese, two of the most widely spoken Indonesian local languages. This present study investigated how the six Javanese vowels and the seven Sundanese vowels influence the production of ten English vowels. In this experiment, 40 speakers, Javanese, Sundanese, and 10 native English speakers, participated in the production task. Spectral dimensions, including first (F1) and second formant (F2) frequencies, were analysed acoustically. According to the Speech Learning Model, Javanese and Sundanese speakers should have trouble producing similar vowels such as (/I, ɛ , ʊ /) and should not exhibit greater L2 differences with new vowels such as (/i:, æ, ɑː , ɔ :, u:, ʌ , ɜ :/). Indeed, the results demonstrated that the Javanese speakers did have different F1 and F2 values with the English vowels (/i:, æ, ɑː , ɔ :, u:, ʌ , ɜ :/) and the Sundanese speakers produced different F1 and F2 values for vowels (/æ, ɑː , ɔ :/) when compared to the English native speakers. Interestingly, though vowels (/I, ʊ /) were considered to be similar vowels in the L1 vowel system, the Javanese and Sundanese speakers also showed differences in the formant structure. The vowel space area in the productions by Javanese and Sundanese speakers was slightly smaller than that of the native English speakers. The present study is expected to serve as a basis for future studies and provide the patterns of English vowels produced by Javanese and Sundanese learners of English.


INTRODUCTION
The effects of first language (L1) vowel systems in second language (L2) acquisition have been cross-linguistically assessed. In production tasks, if an L1 has a complex vowel system, the vowel space area in their speaking is predicted to be crowded (Flege , 2003. Such crowded vowel space creates less room for new a vowel category and brings disadvantages in learning L2 vowels. This prediction, however, seems to be unresolved for an L1 vowel system with a small number of categories (Meunier et al. 2003). If an L1 has a small vowel system, the vowel space area may be less crowded as the L2 vowel sounds would easily adjust into the same L1 category (Iverson & Evans 2007). The current study seeks to contribute to this assumption by examining the vowel space area between two regional languages in Indonesia, namely, Javanese and Sundanese. So far, the vowel space area in L2 vowel production by native speakers of Javanese and Sundanese has never been well investigated.
Vowel Space Area (VSA) measures spectral dimensions of formant frequencies, which are comprised of the vowel height or first formant (F1) and the degree of "backness", of the tongue, or second formant (F2), (Fant 1973, Gimson 1980, Cruttenden 2001. Formant frequencies are crucial to assess intelligibility and to identify the accuracy of pronunciation, as well as the naturalness of speech (Peterson &Barney 1952, Hillenbrand andNearey 1999). L1 speakers who have a wide F1 range appeared to have higher intelligibility scores than the L1 speakers who have a narrow F1 range (Bradlow et al. 1996, Hazan & Markham 2004. The F2 range is, therefore, correlated with the intelligibility of words (Hazan and Markham 2004), not sentences (Bradlow et al. 1996).
The current study explores the VSA of Javanese and Sundanese English language learners by investigating the formant frequencies (F1 and F2 values) of English vowels. This paper first reviews and explains the literature and theories supporting such a prediction. It will then focus on the methods, results, and a discussion of the speech production experiment. We limit this study to the effects of the L1 vowel system; the consonantal context effects that may appear in this study are beyond the present scope of analysis.

L2 SPEECH PRODUCTION MODEL
An often-discussed model rooted in the production of second language sounds is Flege's Speech Learning Model, SLM, (1995a, 1999, 2002. According to SLM, L2 learners can accurately produce L2 sounds if they have accurate understandings of L2 sound properties and the phonetic distance between L1 and L2 sounds. SLM hypothesizes that L2 learners would be less successful in learning similar sounds, indicated by a similar IPA symbol, in the L1 and L2 sounds with audible differences between the two (Flege 1997). The reason is that the similarity between the L1 and L2 sounds will block the formation of a new phonetic category. In contrast, L2 learners would experience no challenges in perceiving new sounds (Flege 1997). New L2 sounds, which are different from L1 categories and have no phonetic counterpart in the L1, enable the learners to develop new L2 categories (Flege 1997).
Cross-linguistic studies reveal the effects of an L1 vowel system with L2 vowel production. The L2 learners are predicted to use the cues from their L1 vowel system and apply them to L2 production. This situation may bring advantages for the L1 with a complex vowel system. Speakers with a larger L1 vowel system may be more successful in using assimilation by changing the L1 category representation to match the L2 vowels creating mergers or comprising categories (Flege 2003, MacKay et al. 2001. For instance, McAllister et al. (2002) found that English speakers with a larger vowel system performed better in producing Spanish vowels. Nevertheless, L2 vowel production is going to be problematic for L2 learners who have smaller L1 vowel systems than the target language. Iverson & Evans (2007) revealed that Germans and Norwegians, who have a complex L1 vowel system, were more accurate at recognising English vowels than French and Spanish speakers who have smaller L1 vowel systems.
A software package called PRAAT was used for the experiment of this study (Boersma & Weenink 2013). Generally, PRAAT allows for the recording of speech sounds and provides a visual representation of the acoustic signal through spectrograms, waveforms, and intonation contours (Olson 2014). The program can inspect the acoustic properties of utterances and show the formant frequencies of a sound as functions of time. PRAAT has allowed for the investigation of sounds, such as vowels and consonants, on speech segments. Almbark (2014) used PRAAT to examine the production of Standard Southern British English vowels by Syrian Arabic speakers and found that the learners differ from the native English speakers on duration, F1 and F2 values. Similarly, Shahidi et al (2012) analysed the production of velar stops of Malay and English by Malay speakers using PRAAT. Their results confirmed a clear contrast between voiced and voiceless velar stops of Malay and English.
To examine the production patterns of English vowels by Javanese and Sundanese English language learners, we set two research questions: (1) Formant frequencies: would Javanese and Sundanese learners show the same formant frequencies producing new vowels (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/) in a native-like way? Would the production of similar vowels (/I, ɛ, ʊ/) show different formant frequencies for the learners? Using SLM, it is predicted that the Javanese and Sundanese speakers may have difficulties for similar vowels (/I, ɛ, ʊ/). However, Javanese and Sundanese speakers would be capable of producing new vowels (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/). (2) Vowel space area: Would there be any influence of vowel space on production by their L1? If yes, how does the pattern impact the vowel space area? Based on the above findings that support the predictive role of the L1 vowel system and size of L2 vowel production (e.g. McAllister et al. 2002, Iverson andEvans 2007), Javanese and Sundanese speakers should be less successful at producing English vowels. With regard to the vowel space area, the Javanese and Sundanese learners should also produce a less crowded spectral dimension (Iverson & Evans 2007). Since previous research on Javanese and Sundanese learners' production of English is very scarce, we need to demonstrate the pattern of the vowel space area by the L2 learners compared to that of native English speakers. A total of 50 participants participated in the speech production experiment. Based on their first language background, they were divided into three groups: American English speakers (AmE), Javanese speakers (JE), and Sundanese speakers (SE). The JE and SE participants all came from Central and West Java. They were students of various universities in Yogyakarta. The mean age for the 20 Javanese speakers (JE groups) was 21.9 (Age range: 20 -30 years, SD = 1.16) and for the 20 Sundanese speakers (SE groups) it was 22 (Age range: 21 -32 years, SD = 2.41). The average age at which the JE and SE speakers began learning English was 8.7 years old (JEL: SD = 1.5; SEL: SD = 2.5). The JE and SE speakers had no experience in travelling abroad. They were proficient in their first language in the sense that they were still using their L1 in their daily lives. The JE and SE participants had learnt English for a minimum of 9 years in formal education (JE: M= 11.75 years, SD = 2.2; SE: M = 9.8 years, SD = 2.4).
The 10 English speakers (AmE group) had come from the central and western areas of the United States. Their mean age was 26.2 (Age range: 23 -36 years, SD = 1.75). All participants were tested at either the Gadjah Mada University in Yogyakarta or the Padjajaran University in West Java, both in Indonesia. Participants were provided with a consent form.

STIMULI
In this study, only English monophthongs were taken into consideration. The Javanese (Jav), Sundanese (Sun), and American English (AmE) speakers produced 10 English monophthongs (/iː, ɪ, e, ae, ɜː, ʌ, ɑː, ɔː, ʊ, uː/). The vowels were embedded in two different consonantal contexts, namely in /bVd/ and /hVd/ syllables. The ten monophthongs in the /bVd/ context were bead, bid, bed, bad, bird, bud, body, bawd, Buddhist, and booed. In the /hVd/ context, the words included heed, hid, head, had, heard, hudd, hod, hawed, hood, and who'd (Ladefoged 2001). All syllables were embedded in the sentence frame "I say (bVd)/ (hVd) again". Sentences were presented in letters on a computer screen. During the recording, subjects repeated the sentences twice. PROCEDURE Before the production experiment took place, all participants completed a brief sociolinguistics questionnaire and a consent form. The first part of the questionnaire elicited their demographic information and their experiences with both their native and second languages. The subjects reported their parents' first language and how often they used their native language. They stated their choice of language at home and at school. The second part of the questionnaire elicited the subjects' background in the second language. The non-native speakers subjects reported their ages when they began learning English and how long they had been studying English. They listed the second language that they have learnt and the competence level of the second language. The Javanese and Sundanese subjects mostly listed Arabic and Japanese as other non-native languages that they have learnt. They shared how long they had learnt English and they confirmed that they had never lived in any English speaking countries.
All participants received a short introduction monologue which contained words simulated for the recording. The introduction monologue was presented on a computer screen. In the meantime, the researcher explained about the experiment and the recording procedures that would be involved. The subjects were given as much time as they needed and were encouraged to ask and comment at any point during the explanation. To start, each of the subjects sat in front of a computer display with active mode recording tools (audio and video recorders, as well as a headset microphone). Once the stimuli appeared on the screen, subjects produced the sentence, for instance "I say bad again". All of the stimuli were presented twice: random and sequenced orders. Their speech production was documented and stored in a computer file. Both audio and video recordings were handled carefully and used for acoustic analysis. All participants were recorded in a sound-attenuated room. Items were digitised using a digital audio recorder (H4N Zoom) and an adjustable microphone headset (Sennheiser PC 141) with a 44.1 kHz/16 bit sampling. The distance that the microphone was set away from the speaker's mouth was approximately 3 cm to create a constant sound record for the entire session for every subject. After the completion of each experiment, subjects were given a post-experiment questionnaire. The post-experiment questionnaire was given to obtain information with regard to the subjects' experiences in producing the stimuli that they received in the experiments. Afterwards, they received their compensation and were allowed to share any concerns about the experiment in a written form.

ANALYSIS
The study utilised Praat 5.3.56 (Boersma & Weenink, 2013) for annotating speech. The Javanese group produced 800 English vowels (2 contexts x 20 speakers x 10 vowels x 2 repetitions), the Sundanese group produced 800 English vowels (2 contexts x 20 speakers x 10 vowels x 2 repetitions) and the American English group produced 400 English vowels (2 contexts x 10 speakers x 10 vowels x 2 repetitions). The total corpus of data amounts to 2000 vowels. Formant frequencies of the participants' speech were tracked through the estimation and plot of each vocal tract. For pitch and formant frequencies (F1 and F2), the values were traced by identifying the formant peak at a chosen time point. The values of the F1 and F2 were specifically measured at the midpoint of the steadily stated selected vowel. The F1 and F2 values were then converted to the Bark scale using the following formula: Zi = 26.81/(1+1960/Fi)-0.53 (Traunmüller 1988).
For statistical analysis, we performed a series of repeated ANOVA test measures using SPSS 22.0 (IBM 2013). A repeated measures analysis was conducted to examine the effect of the factored VOWEL and CONTEXT on the first formants (F1) and second formant (F2) frequencies. If Mauchly's test is significant, we used the Greenhouse Geisser prediction for the entire analysis. The independent variables in this study were L1 GROUP (Javanese, Sundanese, English), VOWEL (/iː, ɪ, e, ae, ɜː, ʌ, ɑː, ɔː, ʊ, uː/) and CONTEXT (/bVd/ vs /hVd/). The dependent variables were the F1 and F2 values. To follow the differences between Javanese vs. English and Sundanese vs. English, we conducted a series of independent t-test analysis to determine whether there was a statistically significant difference between the means of F1 and F2 in two groups (JE vs. AmE and SE vs. AmE).
To test whether the F1 and F2 means were significantly different from each other, an independent t-test was conducted. The t-test results showed the following pattern (see Table  1).  (F1) and second formant (F2) frequencies of the ten English vowels produced in the /bVd/ and /hVd/ contexts, * = p < .05, ** = p < .001 Independent t-tests showed that there was a significant difference between the Javanese and American speakers on F1 values for English vowels (/i:, ɑː, ɔ:, I, ʊ, ae/) in the /bVd/ syllable and vowels (/ɑː, ae, ʌ/) in the /hVd/ syllable. The Javanese and American English speakers showed a significant difference in the F2 values for English vowels (/ɑː, I/) in the /bVd/ syllable and vowels (/ʌ, ʊ/) in the /hVd/ syllable. FORMANT  ]. An independent t-test was conducted to examine the difference between the Sundanese and American English speakers on F1 and F2 values (see Table 2).  (F1) and second formant (F2) frequencies of the ten English vowels produced in the /bVd/ and /hVd/ contexts, * = p < .05, ** = p < .001 Independent t-tests showed that there was a significant difference between the Sundanese and American speakers in the values of the F1 for English vowels (/ɑː, I, ae/) in the /bVd/ syllable and vowels (/ɑː, ɔ:, I, ae, ʊ/) in the /hVd/ syllable. The English vowels (/ɑ:, I, ae/) in the /bVd/ context and (/i:, I/) in the /hVd/ context differed significantly between the Javanese and American English speakers.

VOWEL SPACE AREA
To compare the vowel space area of the Javanese, Sundanese, and American English speakers in the /bVd/ context, the F1 and F2 values were inserted into a vowel quadrangle table (see Figure 2). The vowel quadrangle table refers to the place of articulation in the mouth and represents the position of the tongue for each vowel. The Javanese and Sundanese speakers had a slightly smaller vowel space than the native English speakers.

2.
The vowel /i:/ produced by the Javanese and Sundanese L2 learners was spectrally lower and further back than the vowel /i:/ by the native English speakers. In contrast, the Javanese and Sundanese L2 learners produced a higher and more anterior vowel /I/ than the L1 English speakers. 3.
The production of the vowel /ae/ by the Javanese native speakers was lower than that of the Sundanese native speakers and native English speakers.

4.
There is no overlap in the vowel /ɜ:/ in which the Javanese and Sundanese L2 learners produce the vowel lower and further back than the L1 English speakers. 5.
The L1 Javanese and Sundanese speakers produced the vowel /ɑː/ higher and in a more anterior location than the native English speakers. However, their production of the vowel /ɑː/ fell close to the production of the vowel /ʌ/. 6.
The Javanese and Sundanese speakers production for the vowel /ɔ:/ showed a slight disparity. They produced a slightly higher vowel /ɔ:/ than the native English speakers. To clearly examine the effect of the consonantal context, the vowel spaces of the F1 and F2 values produced by the Javanese and Sundanese L2 learners in the /hVd/ context was conducted (see Figure 3). The figure allows for a number of investigations: 1.
The Javanese and Sundanese speakers produced a vowel space, which was slightly smaller than that of the native English speakers. 2.
The quadrangle table showed that the vowel /i:/ of the learners was slightly further back than that of the native English speakers. However, the learners' production of the vowel /I/ was higher and more anterior than the vowel /I/ produced by the L1 English speakers. The Sundanese speakers produced the vowel /I/ relatively closer to their production of the vowel /i:/. 3.
The Javanese and Sundanese speakers articulated the vowel /ʊ/ more to the back and closer to the production of the vowel /u:/ than the vowel /ʊ/ produced by the L1 English speakers. The Javanese speakers produced it further back than both the Sundanese speakers and the native English speakers. 4.
The Javanese speakers produced the vowel /ʌ/ spectrally adjacent to their L2 production of the vowels /ɔː/ and /ɑː/.

5.
Both groups of L2 learners produced the vowel /ae/ with a more open articulatory height and more closely similar to the vowel /e/ by the L1 English speakers.

DISCUSSION
Two main research questions motivated the current study. Firstly, can Javanese and Sundanese learners produce new vowels (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/) in a native-like way? Will similar vowels (/I, ɛ, ʊ/) create difficulties for the learners? If the SLM (Flege 1995a(Flege , 1999(Flege , 2002 successfully predicts L2 vowel production, Javanese and Sundanese speakers may not exhibit differences in formant structure for new vowels (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/), but may show different formant frequencies for similar vowels (/I, ɛ, ʊ/). Contrary to our prediction, the new vowels (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/) to the L2 learners were considerably more difficult to produce. The Javanese speakers showed different formant frequencies for the new vowels (/i:, ae, ɑː, ɔ:, ɜ:/) but similar vowels (/I, ʊ/) in the /bVd/ context. They also demonstrated significantly different formant frequencies for the new vowels (/ʌ, ɑː, ae/) and similar vowel (/ʊ/) in the /hVd/ context. Unlike the Javanese speakers, Sundanese speakers had significantly different formant frequencies for only similar vowel (/I/) and new vowels (/ae, ɑː/) in the /bVd/ context, and similar vowel (/I/) and new vowels (/ɑː, ɔ:, i:/) in the /hVd/ context. The second research question relates to the vowel space area of the Javanese and Sundanese learners of English and whether or not there will be any difference in the pattern of the vowel space area. We predicted that because the L1 Javanese and Sundanese have a smaller vowel system, the learners might find difficulties and show a less crowded vowel space area in producing English vowels. The visual investigation of the vowel space area suggested that L2 vowel production is problematic for L2 learners who have a smaller L1 vowel system than that of the target language. The vowel space area in L2 production confirms that the L2 learners struggled to attune with the formant frequencies of L2 sounds. The Javanese and Sundanese speakers, who have smaller L1 vowel systems than that of English speakers, did not make accurate formant structures when compared to the native English speakers. The findings support the predictive role of the L1 vowel system and size in L2 vowel production as previously shown in Iverson & Evans (2007). The Javanese and Sundanese speakers, as predicted, are less successful at producing English vowels than the native English speakers.
Considering the evidence that the size of the L1 vowel system may predict the accuracy of L2 production, (Iverson and Evans 2007), it may be taken as evidence that the Javanese and Sundanese learners of English in this study did not have a fully developed category for English vowels. The difficulties experienced by the Javanese and Sundanese speakers are mostly shown in the F1 values indicating that they had different vowel height realisations, rather than in the degree of backness, when compared to the English speakers. However, as already mentioned, the speakers recruited in this study were university students studying English. Although the proficiency level of the Javanese and Sundanese speakers in the current study is comparable, their production accuracy for the English vowels showed differences. Presently, we can only speculate that the causes may be due to the first language vowel system. Javanese has (/I-i, ɛ-e, ʊ-u, ɔ-o/) allophonic pairs (Dudas 1976, Nothofer 2006, Clynes 1995, Wedhawati et al 2006. Besides, Javanese and Sundanese L1 were reported to have no vowel length contrast (van Zanten and van Heuven 1997) and exhibit no long vowels and diphthongs (Gordon 2006). Another factor may also be predicted by explicit phonetic training (see Evans & Iverson 2009, Wang & Munro 1999.

CONCLUSION
To the best of our knowledge, this is the first study to examine both the formant frequencies and the vowel space area of Javanese and Sundanese learners of English. We found that the production of English vowels was challenging for the Javanese and Sundanese EFL learners not only for similar sounds such as (/I, ɛ, ʊ/), but also for new sounds such as (/i:, ae, ɑː, ɔ:, u:, ʌ, ɜ:/). The Javanese and Sundanese native speakers created random patterns based on the degree of difficulty. Moreover, the Javanese and Sundanese L2 learners differed, to some extent, in the vowel space area of the English vowels compared to the native speakers. The quadrangle table containing F1 and F2 values revealed detailed information about the deviations. The Javanese and Sundanese speakers showed a smaller vowel space area than that of the native English speakers.
The current study will serve as a basis for future studies and provide the vowels produced by Javanese and Sundanese learners of English. With respect to pedagogical implications, the results enabled the L2 learners of English to identify pronunciation accuracy and helped to increase intelligibility (Peterson & Barney 1952, Hillenbrand & Nearey 1999. As highlighted by Gass & Selinker (2008), the gaps in vowel space area would cause the L2 learners to notice the divergent areas of L2 sounds. Noticing the gaps between the L2 learners and the native speakers' production would increase the learner's awareness and increase the L2 competency (Munro & Derwing 2008, Gass & Selinker 2008.
In future research, given the differences in formant frequencies, it would be worth exploring the similar pattern that may or may not be apparent in the L2 learners production of different languages and consonantal contexts. More data on cross-linguistic comparisons would make it possible to examine the differences and similarities of vowel space area. Acoustic investigations comparing the vowel space area of other Indonesian regional languages would provide a heightened awareness of L2 learners with various dialects or language backgrounds.