Form and Function of Connectives in Chinese Conversational Speech

Connectives convey discourse functions that provide textual and pragmatic information in speech communication on top of canonical, sentential use. This paper proposes an applicable scheme with illustrative examples for distinguishing Sentential, Conclusion, Disfluency, Elaboration, and Resumption uses of Mandarin connectives, including conjunctions and adverbs. Quantitative results of our annotation works are presented to gain an overview of connectives in a Mandarin conversational speech corpus. A fine-grained taxonomy is also discussed, but it requires more empirical data to approve the applicability. By conducting a multinomial logistic regression model, we illustrate that connectives exhibit consistent patterns in positional, phonetic, and contextual features oriented to the associated discourse functions. Our results confirm that the position of Conclusion and Resumption connectives orient more to positions in semantically, rather than prosodically, determined units. We also found that connectives used for all four discourse functions tend to have a higher initial F0 value than those of sentential use. Resumption and Disfluency uses are expected to have the largest increase in initial F0 value, followed by Conclusion and Elaboration uses. Durational cues of the preceding context enable distinguishing Sentential use from discourse uses of Conclusion, Elaboration, and Resumption of connectives.


Introduction
Connectives are a class of lexical items that signal the relationship between units of text or discourse, connecting two different abstract objects, such as events, states or propositions, in discourse (Asher, 1993). It is posited that connectives have little lexical impact at the local segment level but serve significant pragmatic functions (Hjalmarsson, 2011). In conversational discourse, the position in which connectives occur and the phonetic form of connectives provide cues that help listeners process speakers' intentions and the structure of the ongoing discourse (Didirková et al., 2018;Litman, 1987, 1993;Horne et al., 2001;Rennie et al., 2016;Rhee, 2020). The use of connectives in conversation may be aimed more at marking discourse structure than at referring to the canonical meaning of the connectives themselves. It is our goal to investigate whether there are differences between the canonical use (sentential) and the functional use (discourse) of connectives and whether it is possible to effectively disambiguate between the two uses by looking into the associated phonetic properties in conversation.
Previous research on Mandarin Chinese has identified various discourse-pragmatic functions in an array of connectives (Biq, 1994(Biq, , 2001Wang and Huang, 2006;Wang, 2018;Wang, 1998Wang, , 2005Wang et al., 2013;Wang and Tsai, 2007;Wang et al., 2010;Yang, 2006), but only limited results have described the relationship between the functions and the formrelated properties of connectives. For instance, the topic-shift ránhòu, 'then', has a significantly longer duration and pitch range larger than the canonical use, while the trail-off ránhòu (i.e., marking the closure of the current turn and inviting the hearer's response) shows decreased loudness and durational lengthening. While the connection between the discourse function and the phonetic form of connectives may be strong (Biq, 2001;Wang, 2018;Yang, 2006), it lacks a systematic schema to describe discourse functions of Mandarin connectives and their phonetic forms. Previous studies offer mostly qualitative descriptions of individual connectives. In this study, we pursue a corpus-based discourse function annotation scheme and a quantitative analysis of Mandarin connectives and their phonetic features for discourse function disambiguation. The correlation examined through a statistically grounded method could be critical to the success of spoken language generation and understanding tasks.
This study considers conjunctions and adverbs to be the main lexical categories of target connective (Prasad et al., 2008;Zufferey and Degand, 2017). We conduct an annotation project on target connectives in a Chinese conversational speech corpus and analyze the positional, phonetic, and contextual properties of the associated discourse functions in a multinomial logistic regression model. With this task, we investigate whether a connective's phonetic forms orient to sentential/discourse uses. If they do, are we able to find consistent patterns in terms of specific discourse functions? If speakers show sensitivity to the distinction of sentential/discourse uses of Mandarin connectives in their speech production, statistically significant coefficients that support consistent phonetic patterns in sentential/discourse uses are expected.
This article is organized as follows. Section 2 reviews the previous literature on Mandarin connectives' discourse functions. Section 3 presents the literature on connectives' form-related properties. In Section 4, we present the data, the annotation scheme, and the descriptive results of the positional, phonetic, and contextual features. In Section 5, we examine closely whether there is any coupling between the phonetic form and the discourse function of connectives by a multinomial logistic regression model. Finally, we discuss our main findings in Section 6.

Discourse functions of Mandarin connectives
Taking topic transitions in spoken discourse 1 as the main focus in the consideration of discourse function grouping of Mandarin connectives, previous works basically distinguish discourse functions that signal initiation (resumption) and conclusion of topics, elaboration of various types, and disfluency. First of all, Wang (2005) and Wang and Tsai (2007) noted that the adverbs búguò, kěshì and dànshì, 'but/yet/however', may initiate a new topic in the discourse.
Since the adverbs conventionally imply a contrast in propositions (Miracle, 1991;Ross, 1978), they may extend to imply a contrast between the new topic and the old topic in discourse.
Similarly, Wang et al. (2010) observed that the adverb qíshí, 'actually', functioned to introduce a new topic or a new aspect of the current topic that may not be in accordance with the current claims in discourse (Biq, 1994). Working with the conjunction suǒyǐ, 'so', Wang and Huang (2006) proposed that suǒyǐ may initiate a new topic in the discourse, as the consequences introduced by suǒyǐ can be treated as new information on the discourse level. However, they also noted that suǒyǐ may signal a resumption of a previous topic and prevent further departure into irrelevant topics. The topic-resuming function was also found in the conjunction ránhòu, 'then'. Yang (2006) and Wang (2018) maintained that ránhòu, canonically indicating either a temporal or consequential relationship between two adjacent clauses, can extend to organize utterances in discourse. As such, ránhòu may signal not only a change in topic but also a return to a previous topic after an intervening subtopic.
Connectives can also be used to signal conclusion of a current topic. Wang and Huang (2006) noticed that the speaker employed suǒyǐ to paraphrase or summarize the previous talk so that no misunderstanding was ensured. Relatedly, Wang (2018) described a trail-off use (Local and Kelly, 1986) in the turn-final ránhòu tokens. She maintained that the trail-off use of ránhòu may express "the speaker's intent to close the turn and to invite the hearer's responses of various types, such as acknowledgement, comment, or elaboration on the current topic" (p.

18).
More often in conversation, connectives allow the speaker to provide elaboration or clarification on a current topic. Biq (2001) claimed that the conjunction jiùshì, 'that is', when being slightly semantically reduced, may signal elaboration or clarification of the previous utterance. The elaboration function was also noted in zhǐshì, 'only', typically a restrictive marker (Guo, 1999;Lü, 1980) or a focus marker (Wang, 2005). Wang (2005) and Wang and Tsai (2007) observed that the speaker may employ zhǐshì to introduce an afterthought that comments or elaborates on the previous utterance. However, due to its implicature of contrast, the piece of discourse that zhǐshì introduces may be incongruent with or divergent from the preceding discourse, such as a counterexpectation or surprising fact. Wang et al. (2013) similarly claimed that zhǐshì may indicate "a more detailed or more correct formulation of something stated previously" (p. 203). Wang and Huang (2006) noted that ránhòu featured an additive use that introduces new information to the current topic and connects successive ideas in discourse (Wang, 1998). Propositionally, qíshí is a commentary adverb that delivers the speaker's attitude toward the propositional content. Wang et al. (2010) noticed that the canonical meaning of qíshí has developed several discourse functions that comment on the form or content of an utterance, such as elaborating on the previous utterance with more accurate or specific information.
All utterances encode the speaker's attitude about the proposition to some extent, which may be indicated by connectives. For instance, Hsieh and Huang (2005) concluded that a qíshíembedded clause may disclose a fact that the speaker believes that the hearer does not know.
Propositional attitudes have been discussed in the form of emphasizing or supporting a proposition and limiting the validity of a proposition. Yang (2006) and Wang et al. (2010) suggested that ránhòu and qíshí may place emphasis on the distinctiveness of the following content. Wang et al. (2010) further contended that the fact-introducing qíshí may support and strengthen the speaker's assertion. In contrast, Wang et al. (2013) claimed that the elaboration function of zhǐshì may imply incompatibility or insufficiency of the previous utterance and consequently limit the validity of the proposition in the utterance. The downtoning effect may further imply the speaker's mild negative stance toward the propositions and his or her intent to instruct the hearer to reject previous claims.
Connectives may convey some interactional meanings in the speaker-hearer interaction, such as grabbing the hearer's attention and establishing (dis)alignment between the interlocutors' stances. Yang (2006) and Wang et al. (2010) suggested in their respective projects that the emphasis of ránhòu and qíshí may also function to attract the hearer's attention to the discourse. Regarding the interlocutor stance, Hsieh and Huang (2005) and Wang et al. (2010) found that the speaker may use qíshí to disalign, and sometimes align, him/herself with the previous speaker's stance. Wang (2005) and Wang and Tsai (2007) claimed that the contrastive búguò, kěshì and dànshì may preface the speaker's dispreferred response, such as a rejection or disagreement, to the previous speaker's utterance since disagreement can be one kind of contrastiveness (Ford, 2000). While Wang (2005) and Wang and Tsai (2007) did not explicitly note the function of expressing disagreement in zhǐshì, Wang et al. (2013) lent support to such a description by claiming that the incompatibility introduced by zhǐshì may extend to establish a contrast in the interlocutors' stances, allowing the speaker to express minor or indirect disalignment with the previous speaker's claims. The authors also noticed that in their data, zhǐshì was sometimes prefaced by a brief pause, which signals the speaker's hesitation to agree with the previous speaker.
Last, connectives may sometimes contribute no propositional meaning to the discourse.
They may simply signal disfluencies such as filled pauses. In her study of jiùshì and its variants, Biq (2001) noted that a more reduced jiùshì may serve as a filled pause or floor holder that does not contribute to the proposition. This is evidenced by the fact that the utterance would not be understood differently if jiùshì were omitted. Yang (2006) further added that in the case of ránhòu, the connective may act to perform floor negotiations such as floor holding and turntaking. These functions strategically enable the speaker to have more time to plan what to say next.

Positional, phonetic, and contextual encoding of connectives
In this section, related works on the positional, phonetic, and contextual encodings of connectives with various discourse functions are reviewed. Section 3.1 presents the positional encoding. Section 3.2 introduces various types of phonetic encoding, including pitch, intensity, and word duration correlates. Section 3.3 covers contextual cues that often co-occur with connectives.

Word position
Word position is the location of the lexical item being discussed in relation to a given unit in spoken discourse, be it a meaning-, interaction-or prosody-oriented unit, for instance, intonational unit Litman, 1987, 1993), interpausal unit (IPU) (Gravano et al., 2007;Gravano et al., 2011), speaker turn (Gravano et al., 2007;Gravano et al., 2011;Wang, 2018) or utterance (Rennie et al., 2016;Rhee, 2020). Recognizing the importance of word position, Hirschberg and Litman Litman, 1987, 1993) differentiated between the discourse and sentential use of the English now based on their positions in an intonational phrase and intermediate phrase (Pierrehumbert, 1980). They found that almost all tokens of discourse now (98.41%) were absolutely first or followed only another cue phrase in an intermediate phrase (e.g., well now, ok now), while only 13.5% of the tokens of sentential now were so placed. Sentential now, on the other hand, tended to occur intermediate phrase-finally (59.45%), whereas only 1.58% of the tokens of discourse now did. Similar tendencies were supported in another study of conjunctions (e.g., and, but, or, etc.), adverbs (e.g., actually, also, indeed, etc.), and other cue phrases (e.g., okay, say, like, etc.) (Hirschberg and Litman, 1993).
The utterance or speaker turn position of connectives is relevant to the associated discourse function. Investigating the English so, Rennie et al. (2016) posited that the utteranceinitial and -second so may introduce either a topic shift or a conclusion of a previous utterance.
In addition, the utterance-initial so may initiate a speaker change or a new utterance by the same speaker. On the other hand, the utterance-internal so may perform a resultative function that connects a new piece of information to an old utterance. The utterance-final so may release the speaker's turn. Last, the standalone so may function as a turn-yielding device that urges the hearer to continue with the dialog or as a filled pause that holds the floor for the speaker. Rennie and colleagues, however, examined only the interaction between utterance positions and the two turn-organizing functions for the utterance-initial so. The interaction between utterance positions and other discourse functions remains unknown. Investigating mak, 'coarsely', in Korean spontaneous conversations and scenarios of dramas and movies, Rhee (2020) observed that the filled-pause use of mak had a tendency to occur utterance-finally, which may reflect the tendency of the speaker to perform lexical search and floor holding at the end of the utterance. Despite stating that in principle, filled-pause use can occur anywhere in an utterance, Rhee did not provide any statistics. Concerning the relationship between discourse functions and speaker turn positions, Wang (2018) found that some of the utterance-final tokens of the Mandarin ránhòu had a turn-holding use in her conversational data, while the turn-final tokens all delivered the trail-off use, bringing the current talk to a closure and yielding the floor to another speaker. The trail-off use also occupied an independent intonation unit instead of being embedded in the previous intonation unit. Her findings, however, suffered from the problem of data sparseness, as she found only six tokens of the trail-off ránhòu and did not report any quantitative data for the turn-holding ránhòu. It is unclear whether the observed positional property is significant. Aside from the previously mentioned studies, Gravano and colleagues (Gravano et al., 2007;Gravano et al., 2011) found that the English cue words alright and okay tended to occur initially or independently in an IPU, signaling the beginning of a discourse segment. In summary, all these findings point to a strong tendency that positional encoding reflects discourse functions of connectives in speech production. It is unclear, however, to what extent different positional encodings (e.g., the initial, medial, final, or standalone position) correlate with each discourse function. It also remains an empirical question of which kind of production unit can reliably and effectively disambiguate between discourse functions.

Pitch, intensity, and duration
Previous studies have also reported phonetic evidence for various discourse functions. One phonetic encoding consistently discussed in the literature is pitch, which has been operationalized on different bases, such as pitch accent Litman, 1987, 1993), pitch reset (Didirková et al., 2018;Horne et al., 2001), and pitch range (Wang, 2018;Yang, 2006). Litman, 1987, 1993) found that the discourse use of the English now was more often deaccented than the sentential use of now. When forming part of a larger intermediate phrase, the majority of sentential uses of now received an H* or complex pitch accent, while all discourse uses of now bore an L* accent. Investigating pitch reset, Horne et al. (2001) and Didirková et al. (2018) each described its relation to topic shift in narrative speech and in spontaneous speech. Horne et al. (2001) observed a mean F0 reset for tokens of the Swedish men, 'but', that performed a topic-shift function, similar to the size of the reset one would observe at a topic-shift boundary. The effect of the F0 reset also led to 70% accuracy in distinguishing between discourse and sentential men in a linear classifier model.
Analyzing the French alors, 'then', Didirková et al. (2018) reported that the connective tended to be marked by a reset on the word when introducing a new topic or specification. A reset in pitch was also observed in the French et, 'and', when introducing specification. Focusing on pitch range, both Yang (2006) and Wang (2018) described the way the phonetic parameter manifested various functions of the Mandarin ránhòu. Yang (2006) showed that ránhòu, when signaling a topic shift or returning to a previous topic after an intervening subtopic, tended to have a larger pitch range and more perceptual prominence. In contrast, a use of ránhòu to signal a continuation of the current topic had a narrow pitch range and a more gradual and smoother contour. Yang also added that as an emphasis marker or a floor-negotiating device, ránhòu was marked with an expanded pitch range, suggesting the hearer pay attention to the speaker.
However, no quantitative data were reported. Adopting a quantitative approach, Wang (2018) showed supporting evidence for Yang's findings that the topic-shifting ránhòu had a much larger average pitch range than its other sentential uses. Moreover, the turn-initial ránhòu showed a larger pitch range than the noninitial ránhòu.
Meaningful phonetic variation has also been observed in the study of intensity, commonly referred to as loudness or volume. Rennie et al. (2016) compared the intensity of the English so and that of its adjacent segments. They showed that both the so tokens that introduced either a topic shift or a conclusion of a previous utterance and the so tokens that connected a new piece of information to an old utterance were significantly quieter than their following segment in paired t-tests. In contrast, the so tokens that introduced a conclusion of the current utterance appeared to be louder than its preceding segment, even though the difference was not significant.
The increase in intensity after so, as suggested by Rennie and colleagues, may signal a more important status of the segment following so in the utterance. The decrease in intensity before so, on the other hand, seems to be in line with the tendency of which intensity increases at the start of a new topic and decreases at the end (Brown et al., 1980). Analyzing the Mandarin ránhòu, Wang (2018) also reported a similar observation: that the tokens marking the closure of the current speaker's turn were marked with gradually decreased loudness as well as lengthening. Her observation, however, was qualitative and based on only six tokens in her data.
In connection to the relationship between phonetic encodings and the structure of discourse, attention has been given to temporal features such as word duration. For instance, Horne et al. (2001) observed a significant difference in mean duration between the discourse and sentential use of the Swedish men, where discourse men was longer than sentential men.
Studies on the topic-shift function of the French alors (Didirková et al., 2018) and of the Mandarin ránhòu (Wang, 2018) have similarly reported a longer word duration than that of their respective sentential functions. Yang (2006) also offered insights into the durational correlates of other discourse functions of ránhòu. Yang found that the ránhòu tokens signaling a continuation of the current topic were short in word duration. In contrast, the tokens signaling an emphasis on the distinctiveness of the following content had a much longer word duration.
She claimed that a longer duration attracts the hearer's attention to the current topic, while the lack of prominent duration reflects less of a need to call for the hearer's attention, which typically occurs when the following utterance develops step-by-step within the same topic. Rennie et al. (2016), on the other hand, presented a case where the English so was shorter in duration mean when functioning to grab the hearer's attention in the utterance-initial position. This is also evidenced by the lack of a perceptible pause between so and subsequent speech.
In addition to the aforementioned phonetic evidence, it has also been suggested that highfrequency disyllabic connectives used in Mandarin conversational speech are often produced in a phonetically extremely reduced form (Liu et al., 2016). The duration and position of disyllabic connectives tend to correlate with the degree of word reduction. Therefore, in our later analysis, we will include four types of phonetic correlates, including pitch, intensity, duration, and reduction degree.

Context
There has been some research into how contextual cues such as silent pauses and paralinguistic events may aid speakers in structuring discourse. A silent pause typically signals a major prosodic boundary. Horne et al. (2001) observed that 34% of the topic-shift use of the Swedish men was both preceded and followed by a pause, while none of the sentential uses were.
Similarly, Wang (2018) found that during trail-off use, the Mandarin ránhòu was marked with a pitch contour independent of the previous intonation unit. In addition, it was immediately followed by laughter. On silent pause duration, Didirková et al. (2018) revealed that the silent pause preceding the French alors tended to be longer when the connective opened a new topic or introduced specification. Moreover, they noted that discourse uses of alors was almost never followed by a silent pause. Rhee (2020) reported that the filled-pause use of the Korean mak was often realized after a short pause, signaling that floor holding was needed. He further stated that a pause may distinguish the filled-pause use from other discourse functions of mak, such as the speaker expressing a negative stance and intensifying an utterance, as these functions tended to be marked with no pauses before or after mak. Swerts (1998) also discussed silent pauses preceding and following the Dutch filled pause uh and um. It showed that almost all tokens of phrase-initial uh and um had a neighboring silent pause.

Data and annotation of Mandarin connectives
In this section, we present the scheme with which we labeled our target connectives and the

Target connectives in Sinica MCDC8
Sinica MCDC8 contains eight free conversations produced by seven male and nine female Mandarin Chinese speakers aged between 16 and 46. The speakers were randomly sampled from the citizens of Taipei City in 2001. Each pair of speakers who were invited to participate in the recording project met each other for the first time. The corpus has approximately eight hours of speech recording with 90K transcribed words/122K syllables. Acoustic properties that will be used for our later analysis were measured based on the signaled-aligned syllable boundary information (Tseng, 2019) 2 . In the present study, we performed an exploratory analysis to identify our target connectives in the corpus, including bùguǎn, 'no matter'; jiǎrú, 'if'; jíshǐ, 'even if'; jiùshì, 'is precisely'; háishì, 'still'; huòshì, 'or'; huòzhě, 'or'; rúguǒ, 'if'; suīrán, 'even though; suǒyǐ, 'so'; yaòshì, 'if'; zhǐshì, 'only'; zhǐyaò, 'as long as'; ránhòu, 'then'; and qíshí, 'actually'. As part of the phonetic features, we adopted the labels of disyllabic reduction degree from Liu and colleagues (Liu et al., 2016) and selected only connectives that had such annotation in the corpus. Eventually, we obtained a total of 1370 connective tokens, as shown in Table 1 To present the semantic content and the prosodic organization of spontaneous conversation, Prévot et al. (2015) proposed two types of production units, discourse units (DUs) and prosodic units (PUs). A DU consists of a main predicate and the related complements and adjuncts. A PU is a stretch of speech content separated by perceptible pitch reset, changes in speech rate, and pauses. We adopted the definition of DUs and PUs proposed by Prévot et al. (2015) and used the word position in which a connective occurs relative to the respective DU/PU as our positional features of connectives in our later analysis.

Sentential/discourse labeling
We posited that a connective delivers a sentential use if the meaning of the DU is inevitably changed or becomes incomplete when the connective is removed from the DU in which it occurs. This is illustrated by (1), where the two speakers talked about what they do for work.
After asking Speaker A where her office is and failing to get a satisfying answer, Speaker B asked Speaker A whether she could describe the direction to her office from Nangang District in Taipei, prefaced by rúguǒ in bold. Canonically, rúguǒ indicates hypotheticality, which suggests that the proposition of the DU (nà rúguǒ cóng Nángǎng guòqù 'if going there from Nangang') is a purely hypothetical statement. When rúguǒ is removed from the DU, the semantic meaning of the DU is also affected. This shows that rúguǒ is used sententially in this case.
(1) If a connective adds a designated discourse interpretation to the DU in which it occurs in relation to the local context, it is considered discourse use. An instance of such connective is shown in (2), which presents a conversation in which Speaker A told Speaker B about her trip to a hot spring resort in Japan. Speaker A described the indoor baths as separated from the outdoor baths and stated that the experience was quite interesting. She was going to share her opinion about Japanese people using the expression wǒ juede, 'I feel/think'. She then abandoned the thought and shifted to talking about liking Japan, which is less directly related to what is being discussed. The transition to a new topic (i.e., her liking Japan) on Line 25 was introduced by qíshí, which is discourse use.
(2)  Horne et al. (2001) mentioned that both a sentential use and a discourse use interpretation of connectives can seem possible. We also noticed that in some cases, the distinction of sentential or discourse use can be ambiguous. For instance, in (3), Speaker A told Speaker B about the harmful effect of formaldehyde and her effort to educate people about it. She mentioned that all she could do is to share the information with people, and it is up to people to do something with the information. She then said jiùshì zhèyàng, literally 'that is it' in English, on Line 11. Here, jiùshì may sententially indicate the preciseness of the equation between zhèyàng 'like this' and the previous content. However, we identified a discourse meaning in which the speaker introduced a conclusion for her previous topic and signaled to the hearer that there is no more to add to the current topic. (3)

Annotation scheme of discourse functions
Describing the discourse functions of connectives has often proven a challenging task since the interpretation of the functional properties can be quite elusive and often context dependent. The exploration and initial annotation of discourse functions were carried out by the authors.
Auditory information was used to aid the classification wherever necessary. According to previous work, we were able to identify sentential use and eight types of discourse use for our connectives. It was relatively straightforward to adopt the definitions of resuming and concluding topics and disfluencies. We could also identify a more coarse-grained type of function elaboration for the majority of the cases that provide elaboration or clarification on a current topic. However, for functions such as emphasis, downtoning, securing the addressee's attention, and contrast, it was, in fact, truly challenging to operationalize and identify them.
Therefore, we collapsed the above four discourse functions, along with elaboration, into Elaboration, as they all provide more information to a proposition. The exploration of the data also identified cases where connectives signaled repairs in discourse (Tseng, 2006). As such, we added repair to our discourse functions and collapsed it into Disfluency. This led to an annotation scheme for four function categories: Conclusion, Resumption, Elaboration, and Disfluency. For validation, two trained labelers were recruited for verifying sentential use and the four function categories. Each labeler annotated half of the dataset independently. As a result, the agreement between the authors' and the labeler's annotations achieved a Cohen's kappa of 0.92. Although judgment of sentential/discourse use is likely to be considered highly subjective, the agreement over the annotation of connective functions is surprisingly satisfactory.

Annotation examples
As shown previously in (2) and (3), connectives can be used to perform topic shifts in conversation: qíshí in (2) was found to resume a topic, and jiùshì in (3) may conclude a previous topic. Aside from topic shift, some connectives may introduce an elaboration on the current topic, illustrated by ránhòu on Line 6 in (4).
(4) Intriguingly, we have observed that some connectives can emphasize or downtone the importance of a proposition in discourse, as illustrated in (5) and (6), respectively. In (5), the two speakers were talking about modified cars. Speaker B pointed out a problem on Line 7 in that many people drive fast in their modified cars. He then argued that people should modify their cars only for safety. Prefacing his argument with the fact-introducing marker qíshí (Hsieh and Huang, 2005;Wang et al., 2010), Speaker B was able to suggest the proposition in his argument was factual and consequently emphasize its importance. In contrast, connectives such as zhǐshì can downtone the importance of the proposition in discourse. In (6), Speaker A explained that she was stuck in traffic right before arriving at Academia Sinica, where the recording of the conversation took place. She, however, went on to clarify on Line 18 that the traffic on the way to Academia Sinica was not that bad and that she was late only because she did not estimate the travel time correctly (shíjiān shàng yùgū kěnéng méiyǒu xiǎng yīxià, 'I didn't think about the estimated time to get here.'). Her reason for being late was prefaced by zhǐshì, which limited the validity of the proposition in discourse (Wang et al., 2013). The connective allowed Speaker A to downtone the importance of her reason and strengthen her point that the traffic to Academia Sinica was in fact not bad. Attention-securing is another interlocutor interaction enabled by our connectives. The connective rúguǒ, for example, can be used to secure the attention of the hearer. In (7), Speaker A argued that Western democracy is not the best political system for all countries and that it will be better if a country has the freedom to figure out which system works best for it. He used China as the example for his argument, saying that although China went through a dark time when Communism was first introduced to the nation, it is enjoying great economic development now. He then recalled a report about Shanghai that he saw on TV on Line 10, prefaced by a rúguǒ-led clause addressing Speaker B (rúguǒ nǐ yǒu kàn dìsìtái, 'if you watch the cable TV').
Instead of probing for a response from Speaker B, evidenced by a lack of wait time for Speaker B to say something, Speaker A wanted to bring Speaker B's attention to his next utterance on the TV report. Connectives can also convey certain interactions between interlocutors. For instance, in (8), the speaker used qíshí on Line 5 to express her disalignment with the previous speaker. Last, connectives can signal disfluencies such as filled pauses and repairs, as illustrated in (9) and (10), respectively. In (9), suǒyǐ functioned as a filled pause on Line 11. It was followed by a short pause, which hints at the speaker's hesitation or word-searching.

Descriptive results of annotated connectives
This section presents the descriptive statistics of our target connectives based on the three major groups of features. Section 4.5.1 shows the positional features, which designate the position of a connective in relation to a DU/PU. Section 4.5.2 presents the phonetic features, including duration, F0, intensity, and reduction degree. Section 4.5.3 describes the contextual features, including the duration of preceding and following paralinguistic events as well as the speech rate of DU/PU in which a connective occurs.

Positional features
The position of the connective is operationalized as the initial, medial, and final position in relation to a DU/PU and the case in which a connective itself forms an isolated DU/PU, annotated as DU_initial, DU_medial, DU_final, and DU_isolated. Similar to DU, the positions in a PU are annotated as PU_initial, PU_medial, PU_final, and PU_isolated. As shown in Table   3, connectives of Sentential use do not seem to particularly occur in DU-initial or -medial positions, but when connectives deliver discourse functions of Conclusion, Elaboration or Resumption, a DU-initial position is generally preferred. When used in relation to Disfluency, connectives tend to take DU-medial or DU-initial positions. In contrast to DU, positional features related to PU show that the prosodic manifestation of connectives is diverse across discourse functions. In terms of prosodic segmentation, connectives seem more likely to occur in the form of a standalone unit than in the meaning-oriented segmentation of discourse. We will later conduct a multinomial logistic regression model to examine whether there is any significant effect in the comparison of DU and PU.

Phonetic features
We considered word duration, pitch, intensity, and reduction degree in our analysis, following previous studies' suggestions of phonetic correlates for the discourse function of connectives (Didirková et al., 2018;Litman, 1987, 1993;Horne et al., 2001;Liu et al., 2016;Rennie et al., 2016;Wang, 2018;Yang, 2006). For each connective token, Rate, Initial_F0, and IntensityMean represent the word duration in the form of speech rate (seconds per syllable), the initial F0 value, calculated using the firstPitch function in Praat (Boersma and Weenink, 2022), respectively, and the mean of the intensity values is calculated using Praat's meanIntensity function. Figure 1 presents

Contextual features
Previous research has suggested functional differences in the contextual cues occurring around connectives, such as silent pause (Didirková et al., 2018;Horne et al., 2001;Rhee, 2020) and laughter (Wang, 2018). We considered the position and duration of all paralinguistic events, such as pauses, coughs, laughs, etc., occurring around each connective token in terms of Previous_duration and Next_duration. Previous_duration is the duration of a preceding paralinguistic event, and if there is no immediately adjacent paralinguistic event, the absence is marked with a zero. The same definition applies for Next_duration. Figure 2 presents the results.
Connectives of discourse uses (MeanConclusion: 0.32 sec, MeanDisfluency: 0.21 sec, MeanElaboration: 0.26 sec, MResumption: 0.39 sec) seem to be more likely to be accompanied by a preceding paralinguistic event than Sentential use (Mean: 0.10 sec). However, paralinguistic events following the occurrence of a connective do not seem to be as influential as those that precede such occurrences, except for Disfluency use (MeanDisfluency: 0.33 sec). Another contextual property of connectives we considered is the speech rate of the entire DU/PU in which the connective occurs. We posited that the overall speech rate of DU/PU may correlate with the discourse function by reflecting predominant rhythmic patterns. DU_rate is calculated by DU/PU duration divided by the number of syllables in DU/PU after removing paralinguistic events and fillers. Figure 3 shows the results of DU_rate and PU_rate.
Connective-occurring DUs seem to be articulated in a slower tempo when used for Disfluency

Analysis of form and function of Mandarin connectives
We have identified a number of tendencies for the three groups of features in connective use.
Seemingly, all these features play certain roles in the production of connectives. Our next study fits a multinomial logistic regression model to test whether these features statistically show an inclination for sentential/discourse uses of connectives.

Multinomial logistic regression
Multinomial logistic regression (MLR) makes inferences about category memberships. It describes the probability of a comparison category being chosen over a reference category in a dependent variable as an outcome based on multiple independent variables. In this study, we built regression models to determine which features are more negatively or positively critical than others in predicting discourse functions. The independence of irrelevant alternatives (IIA) is crucial in MLR modeling since the probabilities for any pair of categories should be determined without reference to the other categories that might be available. The IIA can be tested by the Hausman specification test (Hausman and McFadden, 1984). We calculated HIIA using the hmftest function from the mlogit package (Croissant, 2020) and obtained an To build the MLR models, we used the multinom function from the nnet package for R (Venables and Ripley, 2002). For the predictor variables, we considered all the features described in Section 4.3. For the response variable, we considered our Level 1 functions, consisting of five categories: Sentential, Conclusion, Disfluency, Elaboration, and Resumption.
We set Sentential as the reference category in the response variable. We calculated four additional values for each outcome pair of the dependent variable using various functions in R: the z test scores, the p values, the odds ratios, and the confidence intervals (CIs). We calculated the p values using z tests. The z test scores for a given predictor variable were obtained by dividing the predictor's coefficients by its standard errors, which were then transformed into p values. An odds ratio > 1 indicates that the risk of the outcome falling into the comparison category relative to the risk of the outcome falling into the reference category increases as the variable increases. The CI for a given odds ratio informs us of the lower and upper limit of the interval for the odds ratio for the outcome relative to the reference category, given the other predictors are in the model. This is evaluated with a 95% confidence level. For the odds ratios, we used the exp function from the base package to obtain the exponentiation coefficients. For CI, we used the tidy function from the broom package (Robinson et al., 2022).
In addition to goodness-of-fit, we also performed some prediction with the models to see how well our considered features can predict discourse functions. We adopted a 70-30 split (70 for the training dataset and 30 for the test dataset) for the data and used the test dataset for prediction. We indicated the model performance with accuracy, which is calculated by the sum of true positives and true negatives over the number of all tokens.

Overview of the models
Since the gradient variables were each measured by different units, we first performed z score standardization using the scale function in R (Becker et al., 1988). Since our goal is to compare Sentential use tokens, which typically occur in the initial position of a DU, we defined DU_initial and PU_initial as the reference levels for positions in DU/PU. As high-frequency disyllabic words are often contracted or merged (Tseng, 2005), we defined SYM as the reference level for reduction degree. We did this by using the relevel function for R (R Core Team, 2021).
To explore how each group of features fits our data, we built separate models, models [1] to [4], and a complex model of all features, model [5]. We looked for the lowest residual deviance and AIC scores returned by the multinom function for each model to find the bestfitting one. We then examined the coefficients associated with the predictor variables in the best-fitting model for different outcome pairs of the dependent variable.  Table 5: Goodness-of-fit of the Models

Variable performance
The statistical details for each feature of model [5] are summarized in Table 6 in terms of the regression coefficients, the standard errors, the z test scores and p values, the odds ratios, and the confidence intervals. The implausibly large standard errors, z-test scores, odds ratios, and confidence intervals for the independent variable DU_final, DU_isolated, and MSD may be caused by the presence of empty and small cells in Table 3 and  Table 4. It is a generally assumed that for independent variables in Multinomial Logistic Regression the cell frequencies should be greater than 1.

Phonetic variables
We then investigated the effects of the phonetic features of connectives on the prediction of discourse functions in our model. The coefficients of the outcome for Conclusion or Disfluency use are expected to decrease relative to those of Sentential use when connectives are produced in canonical form, and the coefficients reached significance. That is, when a disyllabic connective is pronounced with more phonetic details, it is more likely to be used sententially.
However, this is not the case for Elaboration or Resumption use. The coefficients of the outcomes for all four discourse functions are expected to increase relative to that for Sentential use per point increase in word-initial F0 values, and the coefficients all reached significance. This significant result is clear evidence that the sentential and discourse differences in the use of connectives is reflected in the phonetic representation of the connectives. For word duration, the coefficient of the outcome for Resumption use is expected to increase relative to that for Sentential use per point, and those for Conclusion, Disfluency, or Elaboration use showed a tendency to decrease. Neither of the coefficients, however, was significant. Last, the coefficients of the outcome for Conclusion, Elaboration, or Resumption use showed a tendency to increase relative to that for Sentential use per point of increase in intensity, but without significance, either. In contrast, the coefficient of the outcome for Disfluency use is expected to significantly decrease relative to that for Sentential use per point of increase in intensity.

Contextual variables
Our final analysis investigated the effects of the contextual cues around the connective on the prediction of discourse functions. Our model showed that the coefficient of the outcome for any of the four discourse functions would be expected to increase relative to that for Sentential use per point of increase for the duration of paralinguistic events preceding connectives. All coefficients, except that for Disfluency use, were significant. The coefficients of the outcome for Conclusion, Disfluency, or Elaboration use showed a tendency to decrease relative to that for Sentential use per point of increase for the duration of paralinguistic events following connectives, but without significance. The coefficient of the outcome for Resumption use also showed a tendency to increase relative to that for Sentential use per point of increase for the duration of paralinguistic events following connectives, without significance. For the overall speech rate of DU, the coefficient of the outcome for Conclusion use is expected to decrease relative to that for Sentential use per point of increase. For PU, the coefficient of the outcome for Disfluency use is expected to increase relative to that for Sentential use per point of increase.
Both reached significance.

Lexico-semantic property matters
Making use of all connective tokens in the corpus, we have observed a number of tendencies between the form and the discourse function of Mandarin connectives. However, we would also like to mention that individual differences nevertheless exist among connectives, as the lexicosemantic meaning of connectives inevitably affects the form and the function of connectives to various degrees. We illustrate this point by taking ránhòu as an example, as its phonetic patterns clearly differ from the general patterns we identified for all connective tokens. This is shown in Table 7 and Figures 4 and 5. Although ránhòu is similar to other Mandarin connectives in the sense that these connectives all canonically indicate a semantic relationship between two adjacent clauses, discourse uses of ránhòu occur predominantly in the DU-initial position, while the distribution of discourse uses of all connective tokens is mostly DU-initial or DUmedial. Furthermore, discourse uses of ránhòu tend to occur more often in standalone PUs than discourse uses of all connective tokens, which have a more even distribution across PU positions. These tendencies seem to correspond to the observations in Wang (2018), where ránhòu exclusively occurs in the turn-initial position when signaling topic shifts. In tokens of ránhòu that close the current turn and invite the hearer's response, Wang also found lengthening of the connective and occurrence in only standalone intonation units. While more data and a more sophisticated taxonomy of the discourse functions of connectives are needed to better understand the degrees to which the lexico-semantic meaning of connectives may affect the form and the function of connectives, these tendencies suggest that different lexico-semantic properties may render different configurations of positional, phonetic, and contextual encodings in connectives.
In this paper, we have proposed an applicable scheme that annotates the Sentential, Conclusion, Disfluency, Elaboration, and Resumption uses of Mandarin connectives. Using statistical modeling, we revealed rates of an array of encodings of connectives for the associated discourse functions. This issue related to the interaction between form and function has not been empirically examined in previous research on connectives.
Concerning the positional manifestation of connectives in conversation, tokens that introduce a new topic unit or return to a previous topic most often occur at the beginning of a prosodic phrase or constitute a standalone prosodic phrase (Hirschberg and Litman, 1993;Horne et al., 2001). We have identified differences in positional manifestation in prosodically oriented PUs and semantically oriented DUs. We found that Disfluency use of connectives is oriented more in terms of prosodically determined production units, while Conclusion, Elaboration, and Resumption uses of connectives are oriented more in terms of semantically determined production units. Disfluency use is more likely to occur in the PU-final position and in standalone PUs than Sentential use. Conclusion and Resumption uses are more likely than Sentential use to occur in the initial position of DUs. Elaboration use has similar patterns to those of Conclusion and Resumption uses, except that Elaboration use is more likely than Sentential use to occur in standalone DUs. This may be due to the many subtypes of Elaboration tokens of connectives in our corpus. A larger dataset may provide further insights into this issue.
In contrast to the other discourse functions, Disfluency use is more likely than Sentential use to occur in DU-medial positions or form standalone DUs. Our results confirm that connectives that indicate a topic shift in our study, such as Conclusion and Resumption uses, may be better observed in terms of a meaning-oriented segmentation of discourse, as reported in previous research (Rennie et al., 2016;Wang, 2018), rather than in prosodically segmented units.
Lengthened word duration often appears to introduce a new topic or to return to a previous topic (Didirková et al., 2018;Horne et al., 2001;Wang, 2018) and can be used for the automatic detection of discourse markers (Zufferey and Popescu-Belis, 2004). Although our result did not reach significance, it shows a tendency for a lengthened duration in Resumption use and a tendency for a shortened duration in Conclusion, Disfluency, and Elaboration uses relative to Sentential use. While we report that prosodic patterns of connectives may be determined by the associated discourse functions, we require a larger set of spoken connectives to statistically verify the tendencies in the future. For pitch-related features, we found that all four discourse functions tend to have a higher initial F0 value than Sentential use. If a subject were to increase the initial F0 value, Resumption and Disfluency uses are expected to have the largest increase, followed by a medium increase in Conclusion use and a much smaller increase in Elaboration use. This tendency corresponds to previous findings suggesting that when the topic shifts in conversation, the average onset F0 is higher than for other types of topic boundaries such as topic continuation, elaboration, and speech-act continuation and that elaborating utterances are characterized by the lowest onset F0 among all types of topic boundaries (Nakajima and Allen, 1993). On the other hand, the observation of increased initial F0 value in Disfluency use may be accounted for by the fact that phrase-initial filled pauses have a higher mean pitch than phrase-medial tokens (Swerts, 1998) and that error repairs tend to be marked by increased intonational prominence on the correcting information (Howell and Young, 1991;O'Shaughnessy, 1992). There is also a pitch reset in restarting Mandarin repairs, and the initial F0 in the repaired items is higher than that of its counterpart (Tseng, 2006). Finally, the mean intensity values seem not to be a salient phonetic factor in the context of our connective study.
Nonetheless, we observed that Disfluency use tends to have lower mean intensity values than Sentential use. This may be related to the prosodic characteristics of filled pauses described in the literature, where filled pauses are characterized by low intensity as well as low, flat F0 with reduced articulation (Cole et al., 2005).
We have also observed some interesting interactions between the contextual properties and the functions of connectives. We found that Conclusion, Elaboration, and Resumption uses are more likely to have a longer duration of paralinguistic events preceding the connective than Sentential use. This observation is not only in line with the finding that a relatively long pause is relevant for the automatic detection of discourse markers (Zufferey and Popescu-Belis, 2004) but also lends empirical support to the prosodic characteristics of connectives that introduce a new topic or a specification (Didirková et al., 2018). We, however, did not find any significant effect for the duration of the paralinguistic events preceding connectives in the distinction of Disfluency and Sentential use of connectives, even though a clear durational difference can be observed in Figure 2. Furthermore, we did not find any significant effect for the duration of the paralinguistic events following connectives in any of the distinctions of the sentential/discourse uses of connectives. These findings seem to contrast the observations that speakers tend to produce a pause before repair to indicate that the forward flow of speech is being interrupted (Howell and Young, 1991) and where discourse uses of connectives are more often followed by a pause than sentential uses of connectives (Gao and Tao, 2021). The discrepancies may be attributed to two factors. First, the multiple features in our multinomial logistic regression model may have overlapping predictive powers. The effects of the paralinguistic event-based features in predicting discourse uses may be partially, if not fully, captured by other features, resulting in insignificant coefficients for the paralinguistic event-based features in the model.
Second, paralinguistic event-based features may suffer from the problem of data sparseness.
While they had similar durational means in our data, only 246 connective tokens (out of a total of 1370 connective tokens) had a Next_duration value larger than 0, and only 518 connective tokens had a Previous_duration value larger than 0. Sparse data could result in a lack of generalization performance. A larger dataset may allow for a more informative look at the association between paralinguistic events and discourse functions. Finally, we found that Conclusion use is more likely to occur in DUs with a faster speech rate than Sentential use, while Disfluency use is more likely to occur in PUs with a slower speech rate. This is partially consistent with the positional distribution of connectives in our data, where Disfluency use was more oriented at the properties related to prosody and Conclusion use was more sensitive to semantically determined production units of discourse.
This study has several limitations. One important issue relates to the fine-grained categories of our taxonomy of discourse functions. Although we collapsed categories that share similar functional properties, such as Emphasis and Downtoning, for the analysis, it is worth acknowledging that these categories also have significant implications for speaker-hearer interactions in discourse and require validation from additional empirical data. As our model can be extended to analyze higher-level functional categories, a fine-grained taxonomy validated by empirical data may give us a better picture of the phonetic orientations for the associated discourse functions and ensure the applicability of our proposed annotation scheme to various speech data. Another issue concerns the effect of the lexico-semantic meaning of a connective on the form of the connective. We mentioned that the intrinsic lexico-semantic property of connectives poses differing degrees of variability to their phonetic orientations, such as in the case of ránhòu. It would be interesting to see how the semantic configuration of a connective may contribute to the overall phonetic forms of the connective. For future studies, a large set of well-annotated speech data and a sophisticated taxonomy of connectives are needed to achieve a deepened understanding of the use of connectives in conversation.
In summary, in this study, we systematically investigated many dimensions of the uses of Mandarin connectives in a large Chinese conversational speech corpus, including their discourse functions and production. Incorporating findings from a linguistic modeling perspective, we have further highlighted the connection between the discourse function and the phonetic form of connectives. We believe that the proposed methodology of integrating discourse function annotation and phonetic analyses can shed more light on the way discourse connectives contribute to the dynamic organization of discourse in conversational speech.