Prosodic Phrasing in Lithuanian: Preparatory Study

. The aim of this study is to determine the most significant features of prosodic phrasing which should be analysed in detail in the future. The research material consists of 12 records of the 20 th chapter from Antoine de Saint-Exupéry's The Little Prince . This text was read by four male actors 3 times. The results show that acoustic cues such as pitch lowering, pausing, segmental lengthening, decrease in intensity, and increase in amplitude perturbations may be manipulated by speakers to distinguish among distinct levels of prosodic boundaries.


Introduction
Prosodic phrasing is grouping of words within an utterance (Jun, 2003). This grouping is influenced by a syntactic and prosodic structure of a language, semantic and pragmatic factors, individual characteristics of a speaker (speech rate, emotions, experience of speaking aloud, etc.). On the other hand, prosodic phrasing can lead to the perception and production of language phenomena: for appropriate clause and sentence processing, prosodic and syntactic boundaries have to match (for research on this relationship, see e.g., Watson, Gibson, 2004;Himmelmann, 2022).
There is no single opinion on the amount of levels of prosodic phrasing; however, two levels are usually distinguished in almost all languages: an intonation phrase (IP) and an intermediate phrase (ip) which is intermediate between an IP and a word. These two levels are in the spotlight of the study. Proper phrasing is very important in speech synthesis and recognition; unfortunately, the Lithuanian language has not been studied on this issue. Therefore, it is very important to find out what basic features can signal an IP and ip boundaries.

Aim, Material and Methods
This is the initial stage of the study of prosodic phrasing in Lithuanian. The aim of this stage is to determine the most significant features of prosodic phrasing. For this reason, a controlled-read speech was chosen for the analysis. In order to carry out the experiment, four native male actors read the 20 th chapter from Antoine de Saint-Exupéry's The Little Prince. The text consists of 165 lexemes, about 45 phrases (declaratives, interrogatives, and exclamatives), and it was read 3 times. The actors were asked to read the text clearly, paying attention to the semantical and grammatical connections between words, but without emotional emphasis. They were free to choose the phrasing of the sentences and it did not necessarily have to be the same in all records.
The acoustics data of the investigated segments were measured using Praat (Boersma and Weenink, 2018).

Duration of pauses
Speech pauses can be made for a variety of reasons: they can be caused by psychological, physiological, rhythmic, and other reasons. One of the functions of pauses is to divide speech into certain intonation units. Therefore, it is likely that pauses (or prosodic breaks) may be indicators of phrase boundaries.
The results show that pauses are always made after intonation phrases. The actors identified intonation phrases with punctuation marks at the end of a sentence and always made a prosodic break (Kazlauskienė and Kalašinskaitė, 2017). 70% of the intermediate phrases are separated by pauses. It is important to mention that pausing in this case depends on the readers, and the absence of pauses does not indicate the absence of an intermediate phrase.
One of the tasks of this study is to find out whether the duration of the pauses is different after intonation and intermediate phrases. Duration was measured only for those pauses that are at the end of the phrases; pauses before narrow focus were not analysed.
The duration of pauses varies; it is well illustrated by the high standard deviation (see Table 1). However, one tendency is obvious: pauses after intonation phrases are longer (on average 1.8 times) than pauses which separate intermediate phrases. This difference is statistically significant.

Intensity
In order to find out whether the intensity can signal the end of a phrase, intensity of the whole intonation or intermediate phrase and the last syllable of a phrase was measured.
The results show that the intensity of the last syllable of a phrase is lower than of the whole phrase (see Table 2). On average the whole phrase is produced with 1.1-or 1.2times higher intensity than the last syllable. Although the difference is small, it is statistically significant. The change in intensity slightly depends on the nature of a phrase; the intensity is slightly lower at the end of intonation phrases than at the end of intermediate phrases. We can confirm that intensity decreases at the end of a phrase, especially at the end of intonation phrases (see typical examples in Figure 1).

Figure 1. The samples of intensity curves of an intermediate phrase [kɐd_mɐˈʒɐsʲɪs ¹ˈpʲrʲɪnʦɐs]
(on the left) and an intonation phrase [ɛˈsʊ nʲɛ_kɐʒʲɪŋ_¹ˈkoːks ¹ˈpʲrʲɪnʦɐs] (on the right)

Duration of the last sound of a phrase
Féry (2017) mentions that a signal of the end of a phrase may be lengthening of the last sound. For this reason, it was decided to find out whether this could be characteristic of Lithuanian as well. The empirical material does not cover all sounds; therefore, only the duration of more frequent sounds [s] and [aeː] was chosen to be presented here. These sounds are very common at the end of Lithuanian words. The duration of the last sounds of a phrase and the last sounds of words in the middle of a phrase was compared.
Consonant [s] at the end of a phrase is almost one and a half times longer than in the middle of a phrase (see Table 3). This difference is statistically significant. However, its duration does not depend on the phrase type, as the consonant is only 1.1 times longer at the end of an intonation pause. This difference is statistically insignificant. Thus in the future, the duration of consonants at the end of a phrase may be investigated irrespective of the type of a phrase.
The duration of long vowels at the end of a word is a very complicated issue due to their frequent shortening. As in case of [s], the duration of [aeː] does not depend on the phrase type.
[aeː] is almost 1.4 times longer at the end of a phrase than in the middle of a phrase. The comparison of other sounds also shows a similar tendency; consonants and vowels may be lengthened at the end of a phrase. Moreover, our preliminary comparison shows that the long vowels are more tense (on average 1.4 times) in the middle of a phrase than at the end. However, these assumptions still need to be verified with a larger database.

Voice quality features
For the voice quality analysis, Praat software parameters were selected: the mean of fundamental frequency (F0 mean), jitter (local), shimmer (local), harmonic-noise ratio (HNR), and noise-harmonic ratio (NHR). All these measures provide information about voice signal aperiodicity, stability, noise, and frequency levels. F0 changes at the end of utterances (more specifically, F0 falls) may indicate to listeners the end of an utterance (Ishimoto and Koiso, 2014). Jitter (voice frequency cycle-to-cycle perturbation) is a parameter which evaluates minor glottal pulse irregularities and reflects harsh voice or voice noise in general. Shimmer (voice amplitude cycle-to-cycle perturbation) is the increase of disturbances. It may be related with a decreased or inconsistent vocal fold contact and may reflect noise in voice in general as well. HNR (harmonic noise ratio) and NHR (noise harmonic ratio) measures assess the degree of acoustic periodicity and the presence of noise in a voice signal and establish a general perception of hoarseness in a voice signal (Finger at all., 2009;Teixeira at all., 2013;Boersma and Weenink, 2018).
Using Praat's Voice Report command, we manually extracted all five voice quality parameters for the same vowel: 1) at the end of an intonation phrase (||), 2) at the end of an intermediate phrase (|), 3) and at the end of a word in the middle of a phrase (#  [uː] were analyzed in the current stage of the research. Table 4 illustrates the results of the voice quality analysis. Depending on the vowel and phrase level, the statistical significance varies. The significantly higher perturbation appears in the vowels [ɐ], [aeː], [eː] when uttered at the end of an IP. Compared with their phrase-internal counterparts, they are on average 2.4 times more irregular in frequency. The same vowels also differ substantially (on average 1.6 times) when produced at the end of the ip in comparison to the same vowels produced phrase internally at the end of a word. For the jitter values between boundaries (ip and IP) no statistical significance is observed (except the vowel [ɐ]).
Shimmer. Like jitter, shimmer measurements show the increased amplitude perturbations at the end of larger prosodic units, i.e., the highest shimmer values are in vowels uttered at the end of an IP, smaller at the end of an ip, and markedly decreased at the end of a phrase-internal word (except the vowel [uː]). These differences are statistically important between vowels at the phrase boundaries in comparison to the vowels produced phrase-internally, but not within the boundaries themselves. The values are 2.1 times statistically significantly higher at the end of an IP and 1.7 times statistically significantly higher at the end of an ip compared with their counterparts in the middle of a phrase.
HNR. The harmonicity results show that vowels at the end of utterances tend to be produced with less periodicity (lower HNR values show more F0 irregularity, noise). In the case of Ips, these values are smaller than those of ips, but statistically insignificant. The only relevant difference is between vowels [ɐ] and [eː] uttered at the end of a word in the middle of a phrase and at the end of an intonation phrase. In addition, the low vowels ( NHR. Similarly to HNR, NHR measurements assess the amount of noise in a signal: lower values, less noise. The data of the current research shows that the highest values, signalling more noisy production, are in vowels uttered at the end of an intonation phrase. However, the difference is significant only in the cases of the front vowels [eː] and [aeː] (and in relation only to phrase-internal vowels). In other positions NHR is lower and varies. Again, we can see that the high vowels [ɪ] and [uː] are the 'noisiest' sounds, whereas the low and middle vowels [aeː] and [eː] are the least noisy.
Reviewing the results, we can see that voice quality features at the boundaries of larger prosodic units (IP) become more salient in comparison with smaller prosodic units (ip and phrase-internal word): F0 declines the most, perturbations in frequency and amplitude increase, periodicity decreases, and noise ratio gets higher.
The evaluation of the statistic differences among vowels at the end of different prosodic units shows that the most reliable features would be F0 mean and shimmer, where 78% and 56% of cases were statistically significant. Jitter, HNR and NHR are less relevant parameters for describing voice quality at the boundaries (33%, 11%, 17%, respectively).
We can assume that these voice quality parameters may indicate non-modal phonation at the boundaries, for example, a creaky phonation (low F0, F0 irregularity, see Figure 2); however, this assumption should be verified by a more thorough analysis in the future.

Conclusions
This preparatory study leads to the following conclusions:  The decrease in intensity is a sign of the end of a phrase.  A pause is possible but not mandatory indicator of the end of a phrase.  Sounds can be extended at the end of a phrase.  F0 lowering and increase in amplitude perturbations are the voice quality features that signal about the end of a larger or smaller prosodic unit most significantly. It is necessary to mention that not all speakers use the same strategies for marking phrase boundaries. In addition, the junction of phrases can be shown not only by the features of a previous phrase end, but also by the beginning features of the next phrase. These phenomena should be studied in more detail.
Punctuation in Lithuanian is quite complex and is mainly based on syntax. In terms of phrasing and intonation, it has not been studied at all. Therefore, the relationship between phrase boundaries and punctuation should be analysed as well.
In order to integrate the results of prosodic phrasing into the development of speech technologies, they need to be verified on a larger and more varied dataset. Based on these results, the speech corpus which is used for speech technologies will be expanded.