A Method to Find out Perceptional Sound Composition of the Vowels in the Contemporary Turkey Turkish

Objective: The speech sounds are used in audiology and audio-verbal therapy. Perception of speech sounds is related with their acoustic properties and inner ear physiology. Therefore, a perceptional aspect of the acoustic content should not be overlooked. In this study, we evaluated linear and perceptional changes in the vowels’ sound content at and over the comfortable hearing level according to dBA-filter. Methods: Recordings of 8 vowels (<a, e, i, ı, o, ö, u, ü>) of the contemporary Turkey Turkish were filtered by a dBA-filter. Then linear frequency data (Hz) of both original and dBA-filtered files were analyzed for fundamental frequency (F0) and formants (f1 to f5) by Praat; subsequently, the data were transferred to the perceptional range (Critical Bark Bands, CBB). Results: Our data demonstrated that linear values of F0 and f4,5 did not reveal any relationship with vowels, while f1-3 presented phoneme-specific patterns. dBA-filtering did not affect linear data of f3,4 (<u> was the only exception) and f5. Linear f1values were increased by dBA-filter (particularly in < ı,u,ü>). f2 of <ı, u> presented major deviations. Vowels’ CBB-changes were evident in f1 (the only exception was <e>), and it was evident in only f2 of <ı, u>. Conclusion: It is apparent that speech sound content at and over the comfortable hearing level stimulates higher frequency bands than found in original voice. Only <e> presented no perceptional change while major changes were particularly seen in <ı, u>. Thus, we could pronounce that perceptional aspect by dBA-filter would provide us with a new perspective for understanding the results of speech tests.


INTRODUCTION
The speech sounds are common test signals for audiological tests and audioverbal therapy of children with hearing loss (1)(2)(3)(4)(5)(6). Although words are mostly used, the isolated vowels and some voiced consonants such as <m, n, s> are also applied as test stimulus during mainly testing young children (Ling sounds) (1,3). For both audiological tests and audio-verbal therapy of children with hearing loss, although the meaning is important, the sound content of the speech sample is essential.
The sound content of the speech samples is composed of complex sound waves. The smallest speech sounds are phonemes or allophones according to causing changes in meaning or not, respectively. There are two kinds of speech sounds in any egressive pulmonary language, vowels, and consonants. Source of vowels' sound energy is vibration of the vocal folds (voicing) while sound source of consonants are either voicing (voiced consonants) or turbulent airflow (unvoiced consonants) which is produced by the specific constriction area (named as articulation region) in the upper airway (particularly in oral cavity) or both (voiced consonants). Sound energy of the vowels and consonants is shaped by filtration and resonation in the anterior portion of the upper airway just after the source and articulation areas, and ultimately a specific sound composition is spread out from the oral and nasal cavities (7).
Under normal conditions, speech sound waves are transferred via the atmosphere and reach the ears of the listeners. Through the external and middle portions of the ear, the sound is transferred to the hairy cells in the Corti organ of the inner ear, in which electrical transformation occurs. Then, the sophisticated sound signal is transferred to the primary hearing cortex, as an electrical stimulus. Although hearing sensation, directly, occurs when the signal is reached to the primary hearing cortex, comprehending of this signal is essential for hearing of speech, which is performed in the secondary and associative hearing and speech areas with contribution of memory regions in the brain (8,9). That means speech comprehension of human beings is based on memorizing language-specific formula of each speech signal. Besides, speech signals are able to carry many other messages to the listeners' brains from the mouths of the speakers, such as information of gender, identity of the speaker, and his or her cultural subgroup and accent (differentiation of regional dialects or whether speakers are native or not) and further feelings, intentions, and metaphors in the mind of the speaker during the speech. In linguistics, all these elements of speech are termed as suprasegmental variables of speech and language (10,11). It is clear that people are sharing the same language, the native speakers, are able to code all segmental and suprasegmental information via the speech apparatus and decode in the brain. Details of the speech signal have been subject to many types of research for years. By using different formulas based on fast Fourier transformation (fFT), sound waves in each speech signal have been dissolved (12). As it is known, vowel sounds are produced as nearly periodic complex waves, composed of sound energy accumulated at specific frequency bands, which are known as formants. Consonants are composed of either only aperiodic sound waves or composition of periodic and aperiodic waves (7). For vowels, formants are calculated by using Linear Predictive Coding (LPC) analysis (7,13) or direct visual-manual observation while spectral energy envelopes are produced for the consonants to describe the energy accumulation about frequency ranges (7). By using the mathematical formulas above, many types of research have been performed to analyze speech sounds. Turkish speech sounds have been first analyzed by Selen (1979) by using a device named as "Sonagramm" (14) and then many researches in which various voice analyzing software were used been done by the linguists, engineers, and physicians (11,(15)(16)(17)(18)(19). In these researches, details of the produced speech sounds have easily been able to be demonstrated. However, as well known in otology and audiology, human hearing is neither one-to-one nor a linear process (20)(21)(22). That means sound sample received by the external ear is not transferred to an electrical signal in the Corti organ as it is in details when reached. It has been shown that frequency discrimination in the Corti organ is neither one-for-one nor linear particularly over 500 Hz. The ear is able to hear not each separate frequency but bands and these bands are narrower in low frequencies. That means, the human ear has the ability to distinguish changes in low frequencies better than the high frequencies (21). These frequency bands are known as critical Bark bands (CBB) and the ranges of 24 critical bands were demonstrated between 0 and 18500 Hz by   (21). Further the following formula (23,24) was also proposed to calculate the Bark values: z= (26,81/ (1+ 1960/f)) -0,53 (z: Bark's value; f: frequency as Hz). The researchers proposed the use of Bark scale to describe the perception of the vowels (21,(23)(24)(25). On the other hand, it is known that audibility thresholds of the sounds at different frequencies are different and not linear; the sounds lower than 500 Hz and higher than 6000 Hz can stimulate the Corti organ in higher amplitudes than the sound waves between 500 and 4000 Hz. Furthermore, the external ear canal amplifies sounds particularly at around 2000-3000 Hz; that means, the sounds in this range are perceived as louder than in the atmosphere. dBA filter has been developed to measure the sound energy which harmful to the inner ear, as related with noise issues so that dBA-filter measures the sounds over the audibility thresholds in respect to amplification prior to the inner ear (26,27). Therefore, dBA fits the comfortable hearing level of human hearing at 40 phon as pointed out in the first table of the paper by Barber (2011) (28). British Association of Teachers of the Deaf points out that dBA is used for measures of sound field assessments including speech recognition tests and Assessments with a warble tone generator in Audiology (29). In this study, as the first and as a preliminary study, we aimed to demonstrate whether the sound content of the vowels in the contemporary Turkey Turkish changed at the comfortable hearing level. These data could help for better understanding of responses of the subjects with hearing loss during audiological tests using the speech data and further audio-verbal therapy.

MATERIAL and METHOD
This study was performed as a part of TÜBİTAK project, which was designed to develop software to analyze speech and sounds three-dimensionally (3D) in order to produce their 3D printouts. In the project, the voice of a male (age: 43 years) linguist was used for demonstration of analysis and 3D printouts. In this study, we used the recordings to demonstrate a perceptional aspect of the contemporary Turkish vowels. This study was approved by the Ethic Committee for Clinical Studies in Gazi University. Testing the dBA filter: The dBA-filter was run via the software named as "üç boyutlu ses-konuşma analizi programı, 3BKAP" (three-dimensional sound-voice analysis program, figure 1) which was developed as a part of the TÜBİTAK project. This software was based on the Matlab (matrix laboratory), which is computer software to solve mathematical application problems. The following formula used for dBAfilter (http://www.sengpielaudio.com/BerechnungDerBewertungsfilter.pdf): GMJ 2020; 31: 375-382 Kemaloğlu et al. Figure 1. The 3BKAP software used for dBA filtering (arrow 1). Arrow 2 points out the graphic presenting filtered and remained sound content of the file Before analyzing the data of this study, first, we wanted to demonstrate the output graphics of the dBA-filter in relation to the pure tone signals. For this purpose, first, pure tone sounds (PTS) in the following frequencies were produced in the amplitude of 0,2 Pascal with duration of 0,4 seconds by Praat (30): 125, 250, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 8000, 10000 and 20000 Hz. Then, the PTS were filtered by the 3BKAP, and the PTS and their dBAfiltered outputs were analyzed by the Praat. By using these data, an x-y graphic of the dBA-filter of 3BKAP was formed (x: frequency of the PTAs, y: amplitude, the rest of energy after filtering).

Voice recording:
All voice samples of the subject were taken in the silent room of the audiology department by using Shure-SM-58-LCE cardioid microphone with a pop filter concerning Hirano (1989) criteria (31). The Audacity software (2.0.5.) (https://www.audacityteam.org/) was used for recordings (sampling rate: 44100 Hz and 16-bit resolution) and saved as wav files. The microphone we used were able to detect the sounds over 50 Hz to 15000 Hz. The subject articulated eight vowels of the contemporary Turkey Turkish, which were symbolized by <a, e, i, ı, o, ö, u, ü> in duration as if he was reading aloud the letters in the Turkish alphabet.

Quality analysis:
The recorded files were first analyzed by Computerized Speech Laboratory (CSL, Kay Elemetrics, Model, 4300)-"Multidimensional Voice Profile (MDVP)" for mean Schimmer and Jitter values to attain objective quality measures. Filtering: Then, the recorded (original) files were filtered by dBA-filter and the remaining sound data after filtration was saved as dBA-files of the vowels.

Acoustic analysis of the speech sound data:
Both original and dBA-files of the vowels were analyzed for durations, the fundamental frequencies (F0) and formants (f1, f2, f3, f4 and f5) by Praat. The analysis was performed automatically by the script in addition to the Praat version 4.8 (30,32).

RESULTS
Before analyzing the data of the subject, two quality measures were completed. First, the original files produced by the subject of the study were analyzed to demonstrate voice quality of the subject, and it was found that mean Schimmer and Jitter values of the subject were found to be within the normal ranges (Table 1). Then, the output graphic of the dBA-filter used in this study was prepared and presented in Figure 2. The amplitude of the original PTS was 77 dB-SPL for all frequencies, and after processing by the dBA-filter, the energy lower than 1000 Hz and over 6000 Hz have been decreased while there is an increase from 2000 Hz to 4000 Hz ( Figure 2). The amplitude values at 1000 and 6000 Hz by dBA-filter were found to be 77 dB-SPL. After the demonstration of both voice quality of the subject and capacity of the dBA-filter in the software, the study was run by filtering the original files of the eight vowels pronounced by the subject. The graphics seen in Figure 3 reveal the deleted energy and remaining sound contents of the vowels after dBAfiltering. As seen in the figures, the lower frequencies than 1000 Hz lost most of their energy by dBA-filter.  Table 2, dBA-filter did not reveal any significant change in duration. The lowest durations were detected for <i> (0,642 vs 0,641 sec) while the longest ones were for <ö> (0,798 vs 0,797 sec) in both original and dBA files (no statistical difference by Paired sample t-test, p> 0,05). Linear (Hz) F0 and formant values revealed by Praat and their CBB values were presented in Tables 3 and 4, respectively.   There was no correlation between F0 and f1 values in both the original and dBA-files (Paired samples correlation test, p> 0,05). The CBB value was found to be 1 for both original and dBA files (Table 3 and Figure 5).
The formant values of the original and dBA files were also seen in Table 3 and Figure 5, and their CBB values were presented in Table 4 and Figure 6. It was apparent that f4 and f5 in both original and dBA files did not present any change in relation to the vowels (figures 5 and 6), while f1, f2, and f3 revealed phonemespecific patterns.
dBA-filtering did not affect f3 and f4 with the only exception for <u> and f5 in all vowels (Figures 5 and 6). f1 was the major formant affected by dBA-filtering; all f1values were increased by dBA-filter (up to 1042 Hz in <u>) ( Table 3, figure  5) and these increases caused increases in CBB values of the phonemes with the exception of <e>. Major deviations in f1 were seen for < ı, u, ü> while f2 presented major deviations for <ı, u>.  In the original files, f2 was closer to f1 in <a, o, u> vowels and f3 in <i, ü, e>. Although the distance between f1 and f2 of <a, o> became closer in dBA-files, major changes in the f1-f2-f3 relationships by dBA-filtering were observed in only <ı, u> ( Figure 5). It is clear that in original files, <ö> was the middle vowel of the subject's vowel quadrangle, in which f1 to f2 and f2 to f3 distances were almost equal. By dBA-filtering, <ö> moved to more central position and <ı, u> also became central vowels. In <u>, f2 was increased more than f1 so that the distance between f1 and f2 was increased and f2 became closer to f3.
In the vowel quadrangle, <i, u, e, a> were the corner vowels of the quadrangle, and <ö, ı> were localized in the center of the quadrangle (Figure 7). By dBAfiltering, all phonemes, but particularly <u, ü, ı>, presented major changes in their locations on the vowel quadrangles. The vowel quadrangle-based on CBB values according to F0-f1 vs. f2-f1 is seen in Figure 8.

DISCUSSION
In this study, we used the voice of one male subject presenting voice quality data within the normal limits (Table 1). Selen (1979) also reported one male subject's data to describe Turkish phonemes (14). Moreover, Kılıç (2003) and Yılmaz Davutoğlu (2010) reported the data of 5 and four male subjects, respectively, in their studies in which Turkish phonemes were described (16,19). In this study, our purpose was not to describe Turkish vowels acoustically, but to present whether the difference between the sound data produced and perceived at and over the comfortable hearing level (40 phon) could be important regarding audiological tests using speech signals.
Before that, acoustic data found in the original recordings of the subject were reviewed through the literature. It has been reported that the first three formants of the speech are related to phonemic information of the speech (17). In accordance with this assumption, in this study, only f1, f2 and f3 revealed differences in relation to the vowels while F0 and fourth and fifth formants did not present any apparent variation through the phonemes. The subject's Bark Critical Band ranges for F0, f4 and f5, which were 1, 17 and 18, respectively, also supported this notion. Previously, although Kılıç (2003) and Malkoç (2009) reported the graphics using the CBB values of the vowels, they did not present exact CBB values (16,18). However, when we applied Traunmüller (1988) formula (23) to the linear frequency data (Hz) of F0 and f4, which were both reported by Malkoç (2009) (18), it was disclosed that F0 was 1 and f4 was 17 (with the exception of <ü> which was 16), as we found. Since the other authors (11,14,(16)(17)(18)(19) did not present the data of f4 and f5 we could only say that CBB value of F0 was 1 in the studies reporting F0 (16,18). However, it should be underlined that in this study we used Traunmüller formula (1988Traunmüller formula ( , 1990 for CBB conversion of the linear values (Hertz, Hz) (23,24); it is known that the CBB conversion based on the Traunmüller formula (1988, 1990) (23,24) produced some differences in the lower frequencies compared to the original table reported by   (21). Traunmüller (1990) reported that within the frequency range of the perceptually essential vowel formants (200 -6700 Hz) the formula agrees to within +/-0,05 Bark with the Bark scale, originally published in the form of a table (24). Hence, if CBB ranges of F0 in both our study and the references we pointed out above (16,18) were detected according to the original table of   (21), it would be 2 for <e, i, ı, o, u, ü> in our study, and for all vowels (<a, e, i, ı, o, ö, u, ü>) in the studies of Kılıç (2003) and Malkoç, (2009) (16,18).
It has been documented that f1 and f2 related with amount of the mouth opening and position of the tongue in the anteroposterior direction within the mouth during the articulation of the vowels, respectively because f1 is a product of the total vocal tract length while f2 is produced by resonance of the sound within the area in front of the tongue (17). Our f1 and f2 values supported the reports of the previous researchers on the Turkish vowels: As demonstrated by the previous researches (14,18,19), <i, ü, e> were the anterior vowels of the contemporary Turkey Turkish in which articulation place was placed anteriorly. Hence, their f2 was closer to their f3, as we found.
On the other hand, <a, o, u> were apparently posterior vowels in which f2 was closer to f1 as detected in our study. In this study, the two contradictive vowels (<ı, ö>) of the contemporary Turkey Turkish were found to be placed at around the center of the vowel quadrangle, as in accordance with the linear data of Malkoç (2009) (18).
Although <ı> as /ɯ/ was demonstrated on the right corner of International Phonetic Association (IPA)'s vowel quadrangle for the contemporary Turkey Turkish (38), Kılıç and Öğüt (2004) revealed that it was a central vowel (39), as we found. The vowel quadrangle-based on f2-f1 (x-axes) and f1-F0 (y-axes) by using the CBB values supported the data above and it was in accordance with the graphics reported by Kılıç (2003) and Malkoç (2009) (16,18).
Our subject produced vowels within 641 (<o>) to 798 msc (<ö>). The previous papers about Turkish phonemes did not reveal the durations of the vowels that they analyzed. Only studies about the duration of Turkish vowels were done by researchers of the Boğaziçi University (41)(42)(43). In their studies performed by using the vowels within the words, they reported the vowel durations for between 49,6 and 184,1 msec. Therefore, since our speaker pronounced them as if he was reading aloud the letters in the Turkish alphabet, our vowels' acoustic data could be considered as examples of the long use of the vowels. Yılmaz Davutoğlu (2010) demonstrated the long allophones of vowels of the contemporary Turkey Turkish as follows (19): /ɑ:, a:/ for <a>, /e:, ɛ:/ for <e>, /i:/ for <i>, /ɯ:/ for <ı>, /o:/ for <o>, /ø:/ for <ö>, /u:, ʊ:/ for <u> and /y:/ for <ü>. These demonstrations are in accordance with our data with exception of /ɘ:/ of <ı>. That means, when the based on data of Yılmaz Davutoğlu (which was the only study presenting acoustic details regarding allophones of Turkish vowels) (19), pronunciation of <ı> by our subject appears to be artificially related to duration. That is, the contemporary Turkey Turkish includes /ɘ/ but not its long use /ɘ:/ in speech according to the data of Yılmaz Davutoglu (2010) (19). Behrman (2007) points out an inverse correlation between F0 and f1 (intrinsic frequency change) (7) by using the data of Hillenbrand et al. (1995) (33). It is said that F0 values of <i, u> are expected to be higher than <e, o, a> about increased muscle tension of the tongue on the larynx. Intrinsic frequency change has never been subject to any study performed on the contemporary Turkey Turkish before. In our data, this relationship was partly found; F0 of <i, u> appeared to be higher than other vowels. When the F0 and f1 data reported by Kılıç (2003) and Malkoç (2009) (16,18) were evaluated, intrinsic frequency change was more apparently seen in the data of Kılıç (2003) (16)  Altogether, we could say that our subjects-speaker's vowels are good examples of the contemporary Turkey Turkish. In this study, as the first in the literature, we point out that the sound content of the vowels revealed significant changes when they were filtered by dBA-filter. It is known that dBA roughly corresponds to the inverse of the 40 dB (at 1 kHz) equal-loudness curve for the human ear, which the level of the comfortable hearing (26,28). That means the sound content of the vowels at and over the level of comfortable listening could be different from the produced voice. Since some amount of energy in the lower and upper-frequency ranges are de-emphasized while some in the midfrequencies between 1000 and 4000 Hz is emphasized, the formant centers and bandwidths should change /shift. It is known that the audiological evaluation which is mostly done at over the comfortable audibility level (2,5,6,29). Therefore, the data derived from the sound content of the produced voice could mislead the clinicians, particularly during fitting the hearing devices and cochlear implants.
Before discussing the differences in the sound content of the dBA-filtered vowels, we should have confirmed that output graphic of our dBA-filter which was prepared as based on the data of the PTS is by dBA filter's graphics reported in the literature. The primary energy which was deleted by this filter was in the lower frequencies than 1000 Hz in PTS data ( Figure 2) as well as in the vowels ( Figure 3). As a consequence of the energy loss in the lower frequencies in the dBA-files, we noticed that the lowest formant frequency, f1, presented major change and increased. Moreover, the vowel <u> disclosing the lowest formant values among the vowels of the contemporary Turkey Turkish revealed major changes so that its f1, f2, and f3 were all increased. It is noticed that f1 values were increased not only in Hz but also in CBB values.
As seen in the figures 7 and 8, <u> perceptionally disclosed a sound composition as in the anterior or middle vowels (<ü, e, ö, ı>). According to the references about Ling sounds, <u> has been proposed as a signal with lower frequencies; Eastrabrooks (2006) reported that f1 and f2 of <u> were 430 and 1170 Hz according to the data of the produced sound content of <u>, respectively (4), The data of this study clearly presented that f1 and f2 values of <u> in our study (349 and 852 Hz, respectively, Table 3) were increased by dBA-filter (656 and 1894 Hz, respectively). Eastrabrooks (2006) pointed out the vowels whose f2 was over 1340 Hz as middle and front vowels (4).
Our data reveals that the audible portion of the vowels stimulates higher frequency bands of the Corti than the sound sample found when the produced voiced was analyzed. Therefore, we could suggest that perceptional aspect by using dBA-filter would provide us a new perspective to understand speech tests in the subjects with hearing loss particularly during amplification and audioverbal therapy. Regarding these aspects, the sound content of the recordings produced by articulation of <e, a, o, i> was appeared to be more convenient than <u, ı, ü>. Further studies are necessary to evaluate its clinical importance.