Standardization of acoustic measures for normal voice patterns

Summary Studies have established that normative data is necessary for acoustic analysis. The aim of the present study is to standardize fundamental frequency measures (fo), jitter, shimmer and harmonic-noise ratio (HNR) for young adults with normal voice. Method 20 males and 20 females, between 20 and 45 years, without signs and symptoms of vocal problems; CSL-4300 Kay-Elemetrics; vowels /a/ and /é/. Results for females, vowels /a/ and /é/ had average measures of: fo 205.82 Hz and 206.56 Hz; jitter of 0.62% and 0.59%; shimmer of 0.22 dB and 0.19 dB; PHR of 10.9 dB and 11.04 dB, respectively. For males, vowel /a/ and /é/ had average measures of: fo 119.84 Hz and 118.92 Hz; jitter of 0.49% and 0.5%; shimmer of 0.22 dB and 0.21 dB; HNR 9.56 dB and 9.63 dB, respectively. Both fo and NHR female measures were significantly higher than their male counterparts. Conclusion our results differ from the literature; therefore, it is important to standardize the program in use.


INTRODUCTION
Acoustic analysis is one of the components of computerized voice labs, and it is useful to supplement voice assessment 1,2 and to assess speech 3-5 . Many are the acoustic parameters evaluated in this analysis, and the most commonly used for voice assessment are: fundamental frequency, jitter, shimmer and the harmony-noise ratio.
The fundamental frequency is an important parameter in both the functional and anatomical larynx assessment6 , and it is determined by the number of cycles produced by the vocal folds per second. Such measure is the result of the iteraction among vocal fold length, mass and tension during speech. Among acoustic parameters, fundamental frequency has proven to be the most uniform of them when we consider different acoustic analysis systems, and the one less sensitive to voice recording characteristics [7][8][9] .
Frequency and cycle-to-cycle amplitude variation measures, jitter and shimmer, respectively, in the production of sustained vowels have proved to be useful in the description of normal and dysphonic speakers' vocal characteristics, being respectively related to hoarseness and roughness 6,10-13 . Fundamental frequency, jitter and shimmer seem to also suffer the influence of smoking -fundamental frequency is significantly lower and both jitter and shimmer are higher when we compare smokers to non-smokers 14 .
The harmony-noise ratio characterizes the relationship between the two components of the acoustic wave of a sustained vowel: the periodic component, vocal fold regular sign and the additional noise coming from the vocal folds and the vocal tract 15,16 . This ratio is also significantly different between genders, being higher for females 17 , and it is also influenced by age, being lower for the elderly (from 70 to 90 years), when compared to a group of young (from 21 to 34 years) and middle age women(from 40 to 63 years)16, but it is not a sensitive parameter to differentiate the dysphonic from the normal voice. 13 .
In Brazil, acoustic analysis has been more intensely used in the last decade. Casmerides and Costa18 carried out a study with 32 speech therapists who worked with voice, all of them were professors of Speech Therapy, in order to characterize this group of users, and found that 47% were interested in solving their clinical needs. This was the reason why they used acoustic analysis programs as a complementary tool in their practice. As a general opinion, they attempted to obtain less subjective and more quantitative data. Another result from such study was that, despite the fact that the users seemed to be worried about the quality of recorded samples, standardization did not occur among the users of the same type of lab, nor among the users of different lab types.
According to Titze 19 , standardization educates; simplifies; saves time, money and effort, and assures certification.
Knowing that speech and voice computerized analysis programs use different modes to calculate acoustic parameters, some studies attempt to standardize data for their equipment 6,10,17,20,21 and others have compared their main acoustic measures among the different analysis programs, trying to know whether or not there is an agreement among them 7,22,23 .
Karnell et al. 22 , compared the fundamental frequency, jitter and shimmer among 3 programs and found a measure agreement for the fundamental frequency, but not for jitter and shimmer.
Morris e Brown 7 compared 6 different acoustic analysis systems in order to assess their reliability levels and the agreement among them in determining the fundamental frequency. Their results pointed towards high reliability in each one of the systems when they repeated the assessment of the same signal; however, the agreement among signals varied, being high for the fundamental frequency in men when they uttered a sustained vowel and also in oral reading for women, but low agreement for oral reading in men and sustained vowel for women. The authors also found that the CSL program proved to be the most accurate system to measure the fundamental frequency for the sustained vowel /a/; notwithstanding, it has the highest level of standard deviation, specially for vowel /a/.
Aiming at determining and comparing fundamental frequency values, jitter and shimmer of female individuals, through 4 acoustic wave analysis methods, Spinelli and Behlau 23 assessed 24 subjects without signs and prior history of vocal alterations when uttering the sustained vowel /a/. Results have shown that the fundamental frequency values were similar only between the Soundscope software and stroboscopy, which in turn were lower than the values found by the Vocal-2 software and higher than those found by the Dr. Speech software. Values for jitter and shimmer determined by the Soundscope and Dr. Speech software were statistically different.
Since the literature shows that there are many variables which compete for the final result of a computerized acoustic analysis, it is necessary to normatize the specific data from the software we are utilizing.
Thus, the goal of the present study was to normatize fundamental frequency, jitter, shimmer and noise-harmony ratio (NHR) measures for the CSL 4300 software, from Kay Elemetrics, used in the Speech Therapy Clinic of Ribeirão Preto, so as to obtain comparison data for voice analysis.

METHOD
This study was approved by the Ethics Committee for Research of the Ribeirão Preto University (protocol # 10/03). The subjects were informed about the goal, procedure and disclosure of its results. After agreeing, they signed an informed consent approved by the aforementioned committee and in agreement with Resolution # 196/96 Ministry of Health/ National Health Board/ National Committee of Research Ethics (MS/CNS/CNEP).
Forty young adults, 20 men and 20 women took part in this study. They all went to the University of Ribeirão Preto: they were employees, students, or they were accompanying patients who were going to the Speech Therapy Clinic. Minimum age was of 20 years, since puberty brings about voice alterations stemming from the voice change. Maximum age was of 45 years, because of possible voice changes caused by the very aging of the vocal apparatus as of this age. Age is a relevant variable in vocal assessment 16 .
Other selection criteria for the subjects included not having any signs and symptoms of voice change and not smoke 14 . The procedure used to assess the selection criteria was a questionnaire answered by the participant prior to sample collection (Attachment 1).
Besides not presenting voice alterations signs and symptoms (checked by the questionnaire), the participant's voice was also assessed by the same two speech therapists (paper authors) and only data from the individuals considered with normal voice became part of the present study.
Data collection was carried out in a sound treated room, using the acoustic analysis software CSL-4300 from Kay-Elemetrics, at the Speech Therapy Clinic of the Ribeirão Preto University. The microphone used was a Shure SM 48 dynamic, and it was kept at a fixed distance of 5 cm in front of the subject's mouth. We used the sustained vowels /a/ and /é/, in a comfortable and habitual way, after deep inhaling. The sustained vowel is preferred over regular speech in vocal acoustic assessment 24 . When the sample differed from the regular subject's voice, a new sample was collected. Vocal intensity was controlled by monitoring the software's Vu meter.
In order to analyze the samples, we used the time of 3 seconds, and both the beginning and end of the vowel uttering were discarded. We also discarded the samples in which the authors found altered voice quality.
These vowels were analyzed as to their acoustic parameters: fundamental frequency (Hz), jitter (%), shimmer (dB) and noise-harmony ratio (NHR) (dB). Each one of these parameters was analyzed as to gender and vowel.
The statistical data analysis was carried out through SAS25 GLM procedure, considering the variance analysis mathematical model for random outlining, in split plots (split plot)26, using the following expression: Where yijk = value observed regarding the in gender, from the jn subject, in the kn vowel; m = fixed factor, estimated by the general average; Ii = effect of the in gender (i = female and male); eij = random error corresponding to the plots, supposedly homocedastic, independent and normally distributed; Vk = effect of the kn vowel (k = /a/ and /e/); (SV)ik = effect of the interaction between the in gender with the kn vowel; eijk = random error, corresponding to the subplots, supposedly homocedastic, independent and normally distributed. The minimum level of significance used was of 5% (p£0.05). Table 1 depicts the likelihood descriptive levels of the F test for the values assessed.

RESULTS
We can see in Table 1 that the vowel factor and its interaction with gender was not significant (p>0.05) in all the cases, there has been a significant effect for the gender factor only on variables fo (p<0.0001) and NHR (p=0.0360), and the female averages were higher than their male counterparts for these variables (Graphs 1, 4). For jitter and shimmer, although female averages were higher than the male averages, they did not differ among themselves (p=0.0865) (Graphs 2, 3).

DISCUSSION
The fundamental frequency average found in the present study for vowel /a/, in men (120Hz) was lower than the ones found by Horii10 -125Hz, by Araújo et al. 20 -127.61Hz, by Morente et al. 13 -139.72Hz and higher than the one found by Behlau and Tosi 21 -113.01Hz. The average of the same parameter for women, 206Hz, was lower than the one found by Araújo et al. 20 -215.42Hz and the one found by Morente et al. 13 -267.33Hz; however, it was very similar to the ones found by Ferrand 16 , which were 209.68Hz for young women and 204.49Hz for middle aged women.
The significant difference in the fundamental frequency average values in function of gender, found in the present study was expected, since it is influenced by the length of the vocal folds, which is longer in males. This difference has been often pointed out in the literature 20,21 .
The jitter average regarding vowel /a/, for men was 0.498%, lower than the one found by Horii 6,10 , which were 0.61% and 0.66%, respectively; however, it was higher than the average found by Tajada14 -0.23% and by Araújo et al. 20 -0.37%. As to the average jitter regarding vowel /a/, for females, our result (0.62%) was lower than the one found by Araújo et al. 20 -0.85%, but similar to the one found by Ferrand 16 -0.69%.