Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress

Using vowel polygons, exactly their parameters, is chosen as the criterion for achievement of differences between normal state of speaker and relevant speech under real psychological stress. All results were experimentally obtained by created software for vowel polygon analysis applied on ExamStress database. Selected 6 methods based on cross-correlation of different features were classified by the coefficient of variation and for each individual vowel polygon, the efficiency coefficient marking the most significant and uniform differences between stressed and normal speech were calculated. Using the mean of crosscorrelation values received for area difference with vector length and angle can be classified as the best method for observing generated differences. Generally, best results for stress detection are achieved by vowel triangles created by /i/-/o/-/u/ and /a/-/i/-/o/ vowel triangles in formant planes containing the fifth formant F5 combined with other formants.


Material and Methods
Generally accepted meaning of the term stress is tension, pressure and strain.By this reason, stress can be briefly defined as the state of organism during which the subject is faced to extraordinary conditions and classified as an emotion leading to impact the human behavior.Basically, two types of stress are recognized [1].The first type is so-called eustress stimulating the subject to better performance as the reaction on positive load.Conversely, distress is the second stress type known as the negative reaction on the overload leading to disease, damage or subject destruction.Previous statement gives the testimony of stress generation caused by external objects, so-called stressors, further divided into five main groups: psychical, physic, social, traumatic and children's.Differences within stressor types and their description can be found in [2].
The main motivation of this paper is to present a novel method to psychological stress detection in speech by using vowel polygons, the set of chosen formants grouped into various formations, which can be further applied on other emotions for reaching possibly useful results.
Recently in this field, various tools are utilized for stress detection as well as approach based on the similarity of speech feature, e.g.set introduced by Kurniawan [3] using pitch, MFCCs, Relative Spectral Transform-Perceptual Linear Perception (RASTA-PLP), other biomedical features and Support Vector Machine (SVM) classifier.Kurniawan also points the efficiency between using MFCCs and MFCCs together with pitch is more or less equal.Another method for speech under stress classification is presented by Johari et al., where variances of possibly deployed wavelet filters are used for energy and entropy achievement, which is further classified by SVM and Linear Discriminant Analysis (LDA) [4] applied on SUSAS database [5].The description of another interesting emotion, including stress, classifier developed for call centers can be found in [6], where the best developed classifier is based on SVM and uses so-called Pearson Correlation relevant to the set of selected features.Further publications describing the set of features containing LPC spectrum of residual and auxiliary muscle tension ratio [7], spectrograms and Sigma-pi Neuron [8], autocorrelation envelope, fundamental frequency, formants and MFCCs [9] present possible methods to psychological stress detection in speech.Vowel polygons have not been used yet as the speech feature in this field not even for Czech language.Recently, only the determination of formant feature depending on actual emotional state was observed to vocal tract description in Czech and Slovak language [10] which is related to presented topic.
Generally, a short survey oriented on used speech features and classifiers for stress and emotion recognition can be found in [11].As it can be seen by this review, mostly used classifiers are Hidden Markov Models, SVM and Gaussian Mixture Models.The list of mostly used speech features is also unchanged, thus LPCs, MFCCs, energy, formants and pitch.Quite huge review oriented on psychological stress included in speech is written by Gid-dens et al. [12].This review provides perfectly processed presentation of recent work in the field of stressed speech and the survey of recently used speech parameters.Stress patterns are also described in detail as well as achieved results by various authors, their final summarization and conclusion of further work.Another survey of used methods, databases and mined results can be found in [13], but this publication is mainly oriented on emotion recognition in speech, so the stress topic is described briefly.
In the case of changes in speech caused by psychological stress, the pitch variation depending on stressed speaker's mood is described in more details in [14], where variable increase in speaker's pitch was investigated depending on stress level.Similar experiment was made by Tse et al. [15] where relation between fundamental frequency and its standard deviation under psychological stress was observed by two experiments.Firstly, the presentation was performed by voluntary speakers in their selfcomfort mood, but the second experiment had the condition of minimal pitch variation.By this experiment the fact was proved, that the speech and its parameters can be successfully self-controlled by speaker despite the psychological stress influence.Speech fundamental frequency was also used in another experiment for obtaining the interaction between pitch of stressed speech and its long-term averaged spectra for validation support of a reactivity dimension in schizophrenia [16].By another speech under stress analysis, the differences between lower and higher stress level were observed [17], exactly higher word productivity is occurred in speech under higher stress level as well as more rests during speech [18].

Stress Influence
Obvious signs of vowel polygon changes depending on normal and stressed state of speaker are observed in two criteria.Firstly for each vowel polygon, the area differences between actual (stress) and original (normal) are observed for investigating the possible uniform behavior of this parameter as well as the direction and length of vector facing from original to actual Centre of Gravity (CoG).Figure 1 shows generated vectors for AEI vowel triangle observed in formant plane F3-F4 for high level psychological stress.
For the majority of all possible vowel polygons, the same effects are occurred as well as for illustrated example (see Fig. 1).Firstly, created vectors are mostly uniform in their direction for high stress influence, and their angle reaches approximately value ±π/4.Generally, stress influenced vectors are not occupied in the second and fourth quadrant.By these statements and previous research [19], the increasing direction uniformity of created vectors can be assumed with increasing stress level which leads to erasing the deviations between speakers.
Following observations are focused on getting the cross-correlation values between vowel polygon area difference and one parameter of created vector.These values are also further statistically analyzed by coefficient of variation R defined as follows where σ x is standard deviation of observed parameter x (e.g.cross-correlation values of selected vowel shape over all formant planes) and x is its mean value.This statistical pointer shows higher uniformity of received results by lower number leading to more reliable and significant results [20].

Applied Methods
Presented research was applied on previously described database ExamStress [21], exactly on randomly selected 10 male Czech native speakers telling the same text during and after final exam, which means that two identical records differing only in emotional state are received for each speaker.
These records represent the input of developed and further used software system generating and analyzing vowel polygons [22].Briefly, each input sound record is resampled to f s = 8 kHz, and further vowels are recognized from fluent speech by using two-level recognition system (Mahalanobis distance, Forward-feed Neural Network), retroactively checked [23] and the values of all occupied formant frequencies in each vowel are saved for further processing.In the case of used sampling frequency, at most five formants can be observed in LPC spectrum, which leads to the total number of ten possible formant planes.As it was mentioned, presented research is oriented on Czech language containing five vowels /a/, /e/, /i/, /o/, /u/ and their so-called long equivalents differing only in duration not in pronunciation.The total number of five Czech vowels leads to sixteen different shapes (ten triangles, five tetragons and one pentagon) which can be investigated.These shapes situated in formant planes are called vowel polygons and their generation, marking and other information can be found in [24].Recently, vowel polygons, mainly called as vowel spaces, were used in other fields of speech processing, e.g.achieving children age differences [25], whisper analysis [26] and observation of the Parkinson disease [27], but not applied on stressed speech.The presented method can be also possibly useful to active hypoxia level detection [28].
Differences between normal and stressed vowel polygons are based on previously described formant behavior [29].The core of provided experiments is created by crosscorrelation of chosen vowel polygon's parameters couples for achievement of obvious relations between them.Nowadays, the cross-correlation is ordinary used in the speech processing in the field of emotion recognition [30], speech [31] and speaker identification [32].Following results are obtained for six different couples of cross-correlated parameters.For simplification in the following text, these couples are represented as used experimental methods.The first couple signs the cross-correlation of difference area value and vector length (Method 1), the signum of area difference and vector length (Method 2), the area difference value and vector angle (Method 3), the signum of area difference and vector angle (Method 4).Method 5 is defined as the mean of method 1 and method 3. Method 6 represents the mean of method 2 and method 4.

Cross-correlation
Used stress-influenced records were spoken by master students and captured before trying to pass oral final exam.Generally, experimental results and processes presented in this subsection are captured for 10 male Czech native speakers before and after master thesis defense faced to examination board.Due to possible option which can lead to striking failure of current situation, the stressor's pressure is very intensive on observed subject leading to high stress level situation [33].
Table 1 contains experimentally achieved values for each parameter by Method 5 as an illustration of reached ratios depending on selected formant plane over all possible formant polygons.Obviously, this method is characterized by more or less stable values of all parameters and very satisfactory R values.The formant plane F3F4 can be selected as the most suitable for psychological stress detection due to reaches almost the highest mean value of calculated cross-correlation (slightly significant positive dependency), the smallest standard deviation value leading to the most uniform results over all vowel polygons in this plane.The worst results have been reached by formant plane F1F2 which can be explained by the importance of the first and second formant (F1 and F2) to vowel, not to emotion or speaker characterization.
Average values of observed parameters reached for each method are summarized in Tab. 2. By the comparison of all average results, both mean methods (Method 5 and Method 6) can be classified as the most stable in formant criterion, i.e. over all possible vowel polygons.This fact is based on the smallest values of standard deviation and coefficient of variation leading to very uniform results in each individual formant plane independent on selected vowel polygon.On the other hand, Method 4 seems to be absolutely useless because the highest R value signs the highest dependency on the selected formant plane in the case of psychological stress detection.

Cross-correlation [-]
Method number Similarly to results contained in Tab. 1, results depending on selected vowel polygon (over all formant planes) are listed in Tab. 3 for Method 5. Obviously, the format plane-independent criterion gives more stable results than in the previous case which is characterized by lower R and standard deviation values.Obviously, the AIU vowel triangle has reached the most uniform results over all formant planes and due to this reason it can be seen as the most proper vowel polygon to formant plane independent stress detection.The worst value has been achieved by AEI vowel triangle.

Cross-correlation [-]
Vowel polygon  According to the achieved results, the basic usage of vector angle seemed useless for stress detection.Both mean methods reach much higher uniformity of mined results by cross-correlation.This fact can be caused by the event where each subject feels more or less the same stress level as the other caused by higher probability of final exam failure which leads to less self-confidence of each individual speaker and higher differences between normal and stressed speech.
The consistency of all mined R values is shown in Fig. 2, where the worst methods for stress detection are marked as light blue (Method 2) and orange (Method 4).By this observation, it can be set the statement of the unsuitability of using the area difference signum for high stress detection leading to high cross-correlation results variability and insignificant high stress detection.On the other hand, the most uniform cross-correlation results are received for both mean methods (Method 5 -purple and Method 6 -light brown) in plane and shape criteria.Method 1 is represented by green and Method 3 is marked by dark blue color.By this distribution illustrated in Fig. 2, both mean methods have been confirmed as the methods reaching the most consistent results in formant plane and selected shape criterion.

Efficiency of Vowel Polygons
In this sub-section, the suitability of stress detection will be observed for each possible vowel polygon separately because of not so significant results were achieved only in separated shape or plane criterion.The suitability, exactly the most significant and consist differences, are classified by their current efficiency which is based on results presented in the previous section.Generally, the efficiency of observed parameter x is defined by equation which can be modified for efficiency coefficient E c as follows where CCV is previously calculated cross-correlation value for selected couple of observed parameters for current vowel polygon, R plane is variation coefficient of relevant formant plane and R shape is variation coefficient of relevant shape.Briefly, the value of efficiency coefficients signs the strength of observed couple of parameters for actual vowel polygon referred to statistical values over all relevant planes and shapes.The strength of observed vowel polygon is directly proportional to the E c value -with increasing E c the impact of current vowel polygon rises over others similar and relevant.
Experimentally achieved values of efficiency coefficient E c for each vowel polygon and 6 different observation methods are presented in this section.Due to a big amount of achieved results, following tables list only the 5 top and 5 bottom values.Table 5 contains lists of the best and the worst E c values.Obviously, significant difference between results of methods using vector angle and others exists.The worst results are achieved by cross-correlation methods of area difference value and its signum with vector angle; vice versa other methods reached more or less similar results, thus the best results are achieved for Method 5 followed by Method 6 and Method 1.
From mined results, the best shapes are AIU, AEU and AIO vowel triangles, supplemented also by formant planes F1F5, F2F5 and F3F5.Results on the bottom of the list are also interesting because, as it can be seen, the big amount of vowel polygons gives null results leading to non-suitability of their usage for stress detection.Generally, the usage of vowel triangles and formant planes containing the formant F5 can be finally evaluated as the best choices for stress detection as well as the usage of both mean methods (Method 5 and Method 6).

Conclusion
In this paper differences were presented within vowel polygon parameters and their mutual correlation between normal speech and stressed speech taken from the database Exam Stress.The relationships between observed parameter couples were observed by cross-correlation coefficient and statistical parameter called variation coefficient R for investigating the suitability of a reached result over formant planes and vowel shapes.These observations proved that means methods (Method 5 and 6) do not reach the highest cross-correlation values but are the most suitable over all vowel shapes and formant planes.Furthermore, the appropriateness for possibly stress detection was classified by created efficiency coefficient based on classic efficiency equation for each individual vowel polygon separately.Several statements can be laid by this indicator.Methods 1, 5 and 6 reached the best results, and the worst results were achieved by Method 4 which is characterized by low values of the efficiency coefficient E c (much lower than for other observed methods).
It was also proved that the lower formant planes contain foremost information about spoken phoneme while information of speaker's state and identity are attenuated.The best vowel shape for stress detection proves to be IOU, AIO, AIU, and AEU vowel triangles as well as AEIU and AEIO vowel tetragons.Obviously, the best formant planes for stress detection are F1F5, F2F5 and F3F5.In conclusion, stress can be possibly uncovered by usage of mentioned vowel shapes and formant planes (leaded to a various number of vowel polygons) by the fifth experimental method.In future, presented research will be ap-plied on other language, e.g.English or German, speech under stress database to compare received results and to observe if presented findings are language-dependent or not.

Fig. 2 .
Fig. 2. Plane figuring out reached R for high stress influence.Both axes are in logarithmic scale due to better resolution.

Table 4
is an equivalent to Tab. 2 where average values of observed parameters are listed for all used methods in vowel polygon criterion (over all possible formant planes).In this case, generally higher values, i.e. less uniform, are reached for formant plane-independent stress detection, but significantly useful values are reached for both mean methods (Method 5 and Method 6).These methods are characterized by slightly significant positive dependency, small standard deviation value and very satisfactory R value.The worst values are reached by Method 2 and, obviously, the Method 4 does not work well as in the previous case (see Tab. 2).