Assessing coherence through linguistic connectives: Analysis of speech in patients with schizophrenia-spectrum disorders

.


Introduction
Disorganized speech is a core feature of schizophrenia-spectrum disorders (SSD) (American Psychiatric Association, 2013) that has been increasingly assessed using semantic space models (Corcoran et al., 2020;Corcoran and Cecchi, 2020;Hitczenko et al., 2021).Such computational models create n dimensions, each standing for an abstract feature of word meaning.The represented meaning (i.e., vector) of a given word can thus be located within the semantic space of n dimensions, and it is posited that words with similar meaning are found close to each other within a given semantic space (Landauer et al., 1998;Mikolov et al., 2013b).Using these models, it has been shown that patients with SSD can be distinguished from healthy controls with accuracies between 70 % and 93 % (Elvevåg et al., 2007;Iter et al., 2018;Just et al., 2020;Tang et al., 2021;Voppel et al., 2021), while predicting psychosis onset in at-high-risk individuals has accuracies ranging from 72 % to 100 % (Bedi et al., 2015;Corcoran et al., 2018;Rezaii et al., 2019).
Disorganization in speech is considered to signal a reduction in the underlying semantic coherence of a given message (Corcoran and Cecchi, 2020;Hitczenko et al., 2021).To be attained, coherence requires thematic continuity and grammatical connectivity (Givón, 2020).While thematic continuity is reflected in the maintenance of semantic content, grammatical connectivity refers to the use of explicit markers to hierarchically organize the sequence of the content.Syntactically, grammatical connectivity is most clearly instantiated by linguistic connectives, which relate two or more words, clauses, or sentences to each other (Maat and Sanders, 2006;Sanders and Maat, 2006;van der Vliet and Redeker, 2014).Importantly, connectives establish different types of explicit coherence relations in discourse, such as comparative (e.g., this flower is red, while that other is white), contingent (e.g., I have to replace this piece because it is damaged), expansive (e.g., besides being mammals, gorillas are primates), and temporal (e.g., we will go outside after the rain stops) (Bourgonje et al., 2018;Stede et al., 2019).Thus, in fine-tuning semantic space models to better quantify disorganized speech, it could be valuable to separately assess thematic continuity and grammatical connectivity.
General semantic content has been the main focus of interest in previous studies using semantic space models to quantify coherence.Specifically, in speech of patients with SSD, unusual general semantic content has often been assessed across entire interviews or conversations.Typically, the measures for analysis are obtained by averaging series of semantic distances (i.e., cosine similarities) across all words or sentences uttered by patients, one after each other (Hitczenko et al., 2021).This form of assessment is reasonable considering that semantic space models perform better if they are built upon sentences rather than upon speech samples from word-association or verbal-fluency tasks (de Boer et al., 2018).However, this procedure hampers the possibility to distinctively quantify the coherence that relates to syntactic markers of connectivity (i.e., connectives).
While the use of connectives has not been separately assessed in semantic space models yet, previous studies have examined their occurrence in speech.Patients with SSD have been found to use less connectives of differentiation in comparison to control participants, (Just et al., 2020), and less causal, contrastive, and logical connectives when compared to adults with a diagnosis of HIV+ (Willits et al., 2018).In contrast, another study showed that untreated first episode psychosis patients with high scores in conceptual disorganization (PANSS Item P2) overall used more connectives than control participants (Mackinley et al., 2021).These inconsistencies in results currently limit our knowledge about the frequency and coherence with which different types of connectives are used by patients with SSD compared to control participants.Moreover, even though semantic space models have been shown to be reliable tools to assess disorganized speech in patients with SSD, no previous research has specifically focused on grammatical connectivity.Considering this, in the present study we first evaluated whether patients with SSD and control participants use different types of connectives in similar proportions.Second, by calculating cosine similarity between connectives and their surrounding words (i.e., connectives-related similarity), we assessed whether connectives and their surrounding words can be used as linguistic loci to detect unusual coherence in speech of patients with SSD.Third, we tested how automatic classification driven by connectives-related similarity compares to another driven by nonconnectives similarity, and how accurately connectives-related similarity and proportions per type of connective together could distinguish patients with SSD from control participants.

Participants
Fifty individuals with a schizophrenia-spectrum disorder and fifty healthy control participants, all native Dutch speakers, took part in this study.These participants had been previously investigated in Voppel et al. (2021).Their inclusion took place at the University Medical Center Utrecht.Patients' diagnoses were established by a trained physician, and confirmed using either the Comprehensive Assessment of Symptoms and History (CASH) (Andreasen et al., 1992) or the Mini-International Neuropsychiatric Interview (Sheehan et al., 1998).Patients' severity of symptoms was assessed with the Positive and Negative Syndrome Scale (PANSS) (Kay et al., 1987).Control participants were included if they had neither current nor a history of psychiatric disorders.All participants gave written informed consent before obtaining the measurements.

Speech sampling
Speech was elicited using a semi-structured interview, comprising 60 open-ended questions, from which a subset was presented in a semirandomized order across all participants.The questions were designed to elicit spontaneous speech but prevent excessive emotional arousal, with topics such as life experiences, current daily habits, hobbies, and hypothetical situations, avoiding health-related and psychopathologyrelated topics.To prevent participants from adjusting their own speech, they were informed about the research aim only after concluding the interview.All interviews were conducted by trained researchers.The elicited verbal samples were audio-recorded and later transcribed following the CHILDES protocol (MacWhinney, 2000).Transcribers were blind to participants' group.Procedures were approved by the ethical committee of the University Medical Center Utrecht.

Data preprocessing
Using the CHILDES transcripts, individual plain text files were derived for each participant.Speech of the interviewers, punctuation marks, metadata, headers, special characters, and markers of events (e. g., & = laughs) were all excluded from these files.In the resulting files, all words were set to lowercase, and grammatical contractions (e.g., t'is, standing for het is, "it is") were retained.Fillers (e.g., ehm) and repetitions have been shown to add noise to semantic similarity calculations (Iter et al., 2018).However, they were also retained in the transcripts used for analysis for three reasons.First, it is currently unknown whether fillers and repetitions might influence connectives-related similarity.Second, despite recent attempts to control for "inadequate repetitions" of speech (Just et al., 2020), there is no standard procedure for avoiding this bias.Third, fillers (Tang et al., 2021) and repetitions (Andreasen, 1979;Hong et al., 2015;Maher, 1972) have been shown to be important to distinguish patients with SSD from control participants.

Selection and occurrence of connectives
Based on Bourgonje et al. (2018), 188 different Dutch connectives were selected for this study.These connectives were originally divided in four broad categories, and we created a fifth category (i.e., multiclass) to be filled in with all connectives that were listed in more than one category, excluding such connectives from their initial lists and being relocated to this new one.Connectives ultimately belonged to only one of the following categories: comparison (n = 37), contingency (n = 48), expansion (n = 44), multiclass (n = 23), and temporality (n = 36) (see supplementary materials: 1. List of connectives in Dutch).
For each preprocessed transcript, all occurrences of the 188 different Dutch connectives were automatically extracted using the R "quanteda" package (Benoit et al., 2018).Subsequently, all the extracted connectives were automatically given the label of the category they belonged to.For each occurrence, along with the connective, the previous and the following three words were retained, considering them as part of the surrounding context of the connective, resulting in a seven-words fixed window size.This length was chosen for two main reasons: shorter word-windows are less suited to reveal differences in cosine similarity between groups (Elvevåg et al., 2007), and larger word-windows were found to be poorer informative features for classification tasks (Voppel et al., 2021).All cases in which there were fewer words in the surrounding context of the connective were also preserved (i.e., connectives occurring at the end of an interview), leading a subset of instances to H. Corona-Hernández et al. have less than seven words.

Semantic space model
A semantic space with 300 dimensions was modeled using the skipgram method of the word2vec learning algorithm (Mikolov et al., 2013b).It was trained on more than five-million words from The Netherlands' transcripts collection of the Corpus Gesproken Nederlands (CGN) (van Eerten, 2007), using the R "word2vec" package (Wijffels, 2020).In this model, each dimension might be taken to represent an abstract feature of word meaning, and the meaning of each word (i.e., its word embedding) is the vector indicating the position of the word relative to the 300 semantic dimensions of the model.Using the skipgram method, the word2vec algorithm computes each word embedding in a few steps.For each word, first a random embedding is created.Next, using all instances of the word and its surrounding words as constraint, the random embedding is iteratively changed to resemble the embeddings of its surrounding words more, and the embeddings of words which do not appear nearby less (Mikolov et al., 2013a(Mikolov et al., , 2013b)).
Finally, each word is assigned a fixed and unique embedding, which can then be used to measure the semantic (i.e., cosine) similarity between embeddings (Mikolov et al., 2013b).

Computation of connectives-related and non-connectives cosine similarities
For each seven-words chunk having a connective, the connectivesrelated cosine similarity was operationalized as the cosine similarity between the embedding of the connective and the as-a-whole averaged embedding of the three previous and three following words.All segments of the transcripts no longer containing connectives (henceforth, free-of-connectives segments) were also split into chunks of seven words.For each free-of-connectives segment, the non-connectives similarity was operationalized as the cosine similarity between the embedding of the fourth word and the as-a-whole averaged embedding of the three previous and three following words.Thus, the same procedure was used to obtain the two different types of cosine similarities.
Connectives-related and non-connectives similarities were first

Fig. 1.
Steps (A-G) taken to calculate the connectives-related and the non-connectives cosine similarity.Note that the word "en" is a connective, while the word "twee" is not.
calculated separately for all the chunks of words per participant.Then, six different measures of the cosine similarities were independently obtained per participant: maximum, mean, median, minimum, range, and variance.Finally, these six different measures of cosine similarity were averaged per group for each type of cosine similarity (see Fig. 1).In all cases, cosine similarities could range from − 1 to 1, with − 1 standing for the lowest possible similarity, and 1 for the highest possible similarity.

Statistics
Groups were compared with regard to demographic continuous variables through independent one-way analysis of variance (ANOVA), and nominal variables through Chi-square tests without continuity correction.
To assess differences between groups in the proportion of types of connectives relative to all words used, we carried out generalized linear mixed-effects logistic regression models using the glmer function from the R "lme4" package (Bates et al., 2015), with proportion as dependent variable.Following Baayen (2008) and Winter (2020), group (patients with SSD vs control participants) and type of connective (comparison vs contingency vs expansion vs multiclass vs temporality) were considered as fixed-effects factors, and participants as random-effects factors (allowing by-participant varying intercepts and varying slopes).Implementing a forward-testing approach, we carried out stepwise model comparison between a series of independent regression models in order to arrive at the model that best fitted the data.Likelihood ratio tests were performed to assess whether there were significant differences between each pair of models being compared (Baayen, 2008).
To assess group differences for the connectives-related similarity, we used non-parametric multivariate analysis of variance (MANOVA), followed by post-hoc Wilcoxon rank sum tests with Holm correction.
For all correlations, we used the Spearman's rank non-parametric test, correcting for multiple comparisons when necessary.Statistical results with (adjusted) p-values < .05were considered to be significant.All analyses were done in RStudio version 1.4.1103(RStudio Team, 2019) running R version 4.1.0(R Core Team, 2020).

Classification tasks
For reliable results, in all classification tasks the models were trained using 10-fold cross-validation, repeated ten times.This means that, for each iteration, the learning algorithm used nine-tenths partitions of the data for training, and one-tenth for testing.Accuracy, sensitivity, specificity, and area under the curve (AUC) for receiver operating characteristic (ROC) were obtained in order to assess the performance of each classifier.All tasks were conducted using the R "caret" package (Kuhn, 2021).

Connectives' vs non-connectives' features.
To assess whether connectives-related and non-connectives similarities might yield similar classification results, we first obtained and evaluated the performance of a control-classifier.Afterwards, we assessed how much improvement in performance this control-classifier could gain by independently adding either connectives' features or non-connectives similarity to it.
The control-classifier was built by training a random forest algorithm, using the mean, minimum and variance of general semantic similarity from sliding-windows between 5 and 10 words.This algorithm has been proven to be one of the best in performing binary classification (Fernández-Delgado et al., 2014).Mean, minimum, variance, and the 5-10 range of windows were chosen as parameters based on previous results showing that they were highly informative for classification (Voppel et al., 2021).
For the comparison between the connectives' features and the nonconnectives similarity, number of features was controlled for (see Table 1).To rule out that amount of data influenced the results, a set of connectives-related similarity controlling for this was also calculated (see Table 1).These procedures to control for amount of data and number of features were exclusively done for this series of classification tasks.

Performance of connectives' features alone.
Using connectivesrelated similarity per type of connective either alone or along with proportions of use per type of connective, random forest and support vector machine with polynomial kernel algorithms were trained to perform a binary classification between patients with SSD and control participants.For these classification tasks, the six measures of connectives-related similarity were calculated independently for each of the five types of connectives (see Table 1).

Demographics and speech sample
The majority of patients in our sample had a diagnosis of psychosis not otherwise specified (46 %), followed by schizophrenia (38 %), schizoaffective (14 %) and schizophreniform disorder (2 %).There were no significant differences between groups in age (p = .98)and sex (p = .23).Years of education were significantly less for patients with SSD than for control participants (p = .001).In patients with SSD, the mean dose of antipsychotic medication as measured in chlorpromazine equivalence was 226.1 mg.Thirty-two patients used tight-binding antipsychotics, sixteen patients used loose-binding medication, and two patients were not receiving antipsychotic medication (see Table 2).
General characteristics of the participants' speech sample and use of connectives are presented in Table 3.Since patients with SSD had significantly less years of education than control participants, possible correlations between years of education and basic features of the speech sample were assessed (i.e., tokens and types).Neither number of running words (tokens) nor number of different word forms (types) correlated to years of education, (both rho(ρ) < 0.10, both p > .05).

Proportion of connectives
Following stepwise model comparison with a forward-testing approach, the model that best fitted the data included an interaction between group and type of connective, as well as by-participant varying intercepts and varying slopes for type of connective per participant (see Table 4).Relative to connectives of comparison, patients with SSD had a lower probability of using connectives of contingency (p < .001),multiclass (p < .001),and temporality (p < .001),and a higher probability of using connectives of expansion (p < .001).Similarly, control participants had a lower probability of using connectives of contingency (p < .001)and temporality (p < .001),and a higher probability of using connectives of expansion (p < .001).When comparing the groups, patients with SSD had a lower probability of using connectives of contingency (p = .008)and multiclass (p < .001)than control participants (see Table 5 and Fig. 2).The structure of the random-effects factors is shown in Table 6.
Considering patients with SSD had significantly less years of education, it was assessed whether this could have confounded the abovementioned results.Independent two-tailed bivariate correlations were conducted, and false positive results were controlled using Holm correction.Results showed no significant correlations (all rho(ρ) < 0.21, a Across all participants, there were more connectives-chunks than free-ofconnectives chunks.This was due to partial overlap of some connectiveschunks when two connectives were used too close to each other.Accordingly, for each participant, a random subsampling of the connectives-chunks was carried out.all adj-p > .05).

Connectives-related similarity
Multivariate analysis of variance showed that there were significant differences between groups in connectives-related similarity, F (10.3, 1018.3)= 5.2, p < .001.Post-hoc analyses showed that patients with SSD had higher minimum similarity of temporality connectives (adj-p < .001),as well as narrower range (adj-p = .002)and lower maximum similarity of expansion connectives (adj-p = .005)than control participants.Additionally, compared to controls, patients had narrower range (adj-p = .04)and higher minimum similarity of multiclass connectives (adj-p = .04)(see Table 7).Maximum similarity of expansion connectives positively correlated to years of education (rho(ρ) = 0.30, adj-p = .01),while the other four connectives-related similarities did not (all rho (ρ) ≤ 0.16, all adj-p > .05).In performing these analyses, a missing value in the variance of contingency connectives in one patient, and a missing value in the variance of temporality connectives in another patient, were substituted with zeros.Running the same analyses with the exclusion of these two patients did not change the results of these variables.For a full overview of the results with the exclusion of these two patients, see supplementary materials: 2. Additional analyses on connectives-related similarity.

Connectives' vs non-connectives' features
The control classifier (RF-c) yielded 83.5 % accuracy.By adding nonconnectives similarity to the classification (RF-non-conn), accuracy resulted in 84.9 %.Matched in number of features and amount of data, the classifier that rather included connectives-related similarity (RFconn-I) yielded 85 % accuracy.The combination of the sliding-window measures and the connectives-related similarity matched in features and data (RF-conn-I) yielded the highest sensitivity (81.2 %).The combination of the sliding-window measures and the connectives' features that were significantly different between groups (RF-conn-V) yielded the highest specificity (93.6 %) (see Table 8).

Performance of connectives' features alone
Using connectives-related similarity per type of connective alone, the best classifier (RF-I) yielded 79.4 % accuracy, 75 % sensitivity and 83.8 % specificity.Combining these features with the proportions of use per type of connective, the best classifier (SVM-II) yielded 85 % accuracy, 83.8 % sensitivity and 86.2 % specificity (see Table 9).

Discussion
In this study, we analyzed linguistic coherence by comparing the relative use of different types of connectives and connectives-related similarity between patients with SSD and control participants.In parallel, we assessed how much connectives' features might improve a control-classifier, followed by an evaluation of the usefulness of connectives' features to achieve accurate results in automatically distinguishing patients with SSD from control participants.
Patients with SSD used significantly less contingency and multiclass connectives, while their use of the other types of connectives was not different from that of control participants.Although years of education differed between groups, it did not seem to affect these results.Regarding connectives-related similarity, patients with SSD had higher minimum similarity in both multiclass and temporality connectives, narrower range in both expansion and multiclass connectives, and lower maximum similarity in expansion connectives.
In the classification tasks comparing connectives' features and nonconnectives similarity, both types of measures yielded similar overall performance in distinguishing patients with SSD from control participants.In the second series of classification tasks, combining connectivesrelated similarity per connective type with proportions of use per type of connective, the best classifier yielded 85 % accuracy.

Proportion of connectives
We found significant differences between groups in their use of contingency connectives (subsuming cause, condition, and purpose connectives).This is partially in line with previous studies showing that connectives of cause are used relatively less by patients with SSD (Willits et al., 2018), but opposite results have also been found (Just et al., 2020).Aligning with previous research, we found no significant differences between patients with SSD and control participants in their use of expansion (also referred as additive connectives) and temporality connectives (Willits et al., 2018).Our results also showed no significant differences between groups in their use of comparison connectives (including concession, contrast, and similarity connectives).This is partially inconsistent with previous studies reporting that patients with SSD use less contrastive and differentiation connectives than control participants (Just et al., 2020;Willits et al., 2018).
No previous studies have paid specific attention to polysemic  Note: the first row stands for the initial random model.Each subsequent row shows how the goodness of fit increased when the factor in the row was added to the model that included all preceding factors.connectives, as the ones included in our multiclass category.In our study, the proportion of use of multiclass connectives was significantly smaller for patients with SSD than for control participants.Polysemic words are processed faster than unambiguous words (Eddington and Tokowicz, 2015;Klepousniotou and Baum, 2007).It is possible that this fast cognitive processing of polysemic connectives was different between patients with SSD and control participants in our sample, reflecting itself in the smaller proportion of polysemic connectives used by the patients.Yet, some variables that were not controlled for might Note: estimates are "log odds" (i.e., logits).Positive estimates reflect an increase in probability and negative ones reflect a decrease.For a general guide to mixed-effects models in linguistics and their interpretation, see Baayen (2008) and Winter (2020).have influenced our results.For instance, semantic activation of polysemic words is influenced by word frequency and context (Rice et al., 2019), and pragmatic inferences play a role in this as well (Carston, 2021).In parallel, it has been argued that connections between the word-forms and meaning of words in the mental lexicon are weaker in patients with SSD than in control participants (Kuperberg et al., 2019).Whether any of these factors relate to our results of the use of these Overall, our results suggest that speech of patients with SSD is characterized by a relative reduction in the use of contingency connectives (i.e., markers of cause, condition, and purpose) and multiclass connectives (i.e., markers that can establish more than one explicit type of semantic relation between clauses and/or sentences).

Connectives-related similarity
Previous research has shown that, compared to control participants, patients with SSD reach lower scores in semantic similarity (Elvevåg et al., 2007;Iter et al., 2018;Just et al., 2019).There is consistency between those findings and our results showing that patients with SSD had lower scores in three out of the five connectives-related similarity measures that significantly differed between groups.
In detail, our results showed that patients with SSD had narrower range in similarity of expansion connectives.Expansion connectives establish additive relations with either positive (e.g., books and notebooks) or negative polarity (e.g., books or notebooks) (Evers-Vermeul and Sanders, 2009;Sanders et al., 1992).The narrower range in similarity of expansion connectives might indicate that there is less semantic variation in the words, clauses and sentences added together by patients with SSD.Patients also had a lower maximum similarity of expansion connectives.This might mean that, compared to control participants, patients with SSD added together words, clauses and/or sentences that shared less semantic features.Of notice, maximum similarity of expansion connectives positively correlated to years of education.The results of maximum similarity of expansion connectives could therefore be a reflection of education level, rather than a difference between patients with SSD and control participants.
Intriguingly, in our study, patients with SSD showed higher minimum similarity of temporality connectives.With few exceptions (Panicheva and Litvinova, 2019), this is opposite to the majority of previous reports showing that patients with SSD often have lower semantic similarity scores than control participants (Elvevåg et al., 2007;Iter et al., 2018;Just et al., 2019).Temporality connectives establish ordered relations between series of events (Evers-Vermeul and Sanders, 2009;Sanders et al., 1992).Patients with SSD use temporality connectives as core linguistic devices to achieve coherence in narrative discourse (Saavedra, 2010).Interestingly, cognitively well-functioning patients with SSD can achieve temporal coherence similar to that of control participants (Holm et al., 2016).In our sample of patients with SSD, their total PANSS score (see Table 2) indicates that, cognitively, they were well-functioning (Leucht et al., 2005).Thus, our results show that patients with SSD can use temporality connectives as coherently as control participants during semi-structured interviews, suggesting that their use of temporality connectives might be related to cognitive wellfunctioning.
As well, patients with SSD had narrower range and higher minimum similarity of multiclass connectives.The narrower range might mean that patients had less semantic variation in the words conforming the previous and following context of multiclass connectives.Accordingly, the higher minimum similarity of multiclass connectives would reflect the low-end of such narrower range of semantic variation.

Connectives' vs non-connectives' features
Among all classifiers, connectives' features and non-connectives cosine similarity yielded similar accuracies in distinguishing patients with SSD from control participants.When controlling for amount of data and number of predictors, general connectives-related similarity (RFconn-I) seemed to increase sensitivity to classify patients with SSD.However, it is likely that this was due to the random sub-sampling procedure, because general connectives-related similarity (RF-conn-II) no longer increased sensitivity when the similarity was calculated based on all connectives-chunks.In contrast, the classifier using the connectives' features that were significantly different between groups (RFconn-V) yielded 6 % more specificity than the control classifier (RF-c), suggesting that connectives' features that were significantly different between groups are useful to correctly classify true negatives.Overall, our results suggest that connectives' features and non-connectives similarity can reach similar results in distinguishing patients with SSD from control participants.

Performance of connectives' features alone
SVM-II yielded 85 % accuracy, 83.8 % sensitivity and 86.2 % specificity.These percentages are within the accuracy range reported in previous studies (for reviews, see Corcoran et al., 2020;Corcoran and Cecchi, 2020;Hitczenko et al., 2021).Accuracy of 85 % had been previously obtained by our group using a full sliding-window general-semantic-similarity classifier (Voppel et al., 2021).SVM-II included connectives' features alone, and it was trained based on less speech input (i.e., only connectives and their surrounding words).This suggests that connectives and their surrounding words are linguistic loci that might concentrate important patterns to detect atypical coherence related to SSD.

Limitations and future directions
We acknowledge our study has limitations.Our proportion-of-connectives' results could not be straightforwardly compared to previous findings due to differences in the control group (healthy participants in our study and HIV+ in Willits et al., 2018) and in the analysis technique that was employed (mixed-effects logistic regression models in our study and principal component analysis in Mackinley et al., 2021).Also, the number of categories of connectives varied across studies (ranging from two to five), as well as the number of connectives per category and the number of types of connectives inside each main category.Similarly, there were inconsistencies in annotation schemes for connectives.For instance, in contrast to previous studies (Just et al., 2020;Mackinley et al., 2021;Willits et al., 2018), the annotation scheme that we used (Bourgonje et al., 2018;Stede et al., 2019) did not include a separate category for logical connectives, following the line of reasoning that there are no logical connectives, but rather abstract logical operators that then can have linguistic correlates in different types of connectives (Sanders et al., 1992).Furthermore, fillers and repetitions were not removed from the transcripts used for our analyses, even though they are known to influence cosine similarity calculations (Iter et al., 2018).

Table 9
Results of using connectives-related cosine similarities either alone or along with proportion of connectives in distinguishing patients from controls.In replicating or expanding our results, future studies on the use of connectives in speech of patients with SSD should take into account a series of methodological challenges.The first is a consistent use of connectives categories across studies, which would aid knowledge accumulation.Secondly, our theory-based decision of using a sevenwords window alone for our analyses decreased the Type I error rate of our findings.However, it remains to be determined whether this is the most appropriate window for the assessment of cosine similarity between connectives and their surrounding words.This was beyond the scope of the current study, but future research may specifically address the role of window size by directly comparing a range of different window sizes.In relation to this, it would be necessary to analyze what procedure to calculate the connectives-related similarity is the most reliable, valid and theoretically sounded.More recent computational semantic representations could be used to obtain the word embeddings for analyses, such as Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019), possibly in combination with time-series analyses of semantic coherence (Xu et al., 2022).Also, mixed designs (e.g., Holm et al., 2016;Saavedra, 2010) might provide valuable details overlooked by quantitative approaches alone, increasing our comprehension of how thematic continuity and syntactic connectivity (in)dependently build up (in)coherence in patients with SSD.
Additionally, we need to understand how the use of connectives might be influenced by speech elicitation techniques and cognitive factors.For instance, speech elicitation techniques (e.g., re-telling a story or reading a text out loud) have been shown to influence linguistic outcomes (Kapantzoglou et al., 2017;Niebuhr and Michaud, 2015), and some types of connectives are more cognitively demanding to use than others (Evers-Vermeul and Sanders, 2009;Zufferey and Gygax, 2020).For these reasons, in future studies it would be informative to assess whether the use of connectives differs among speech elicitation techniques, and whether this might also depend on the varying cognitive demands of different connectives.This emphasizes the importance of exploring possible relations between patterns of connectives' use and cognitive outcomes in patients with SSD.
Also, future research should examine the generalizability of our findings on the use of connectives to other languages and across generations of speakers.There is evidence that patterns of word use are consistent across different linguistic families (Calude and Pagel, 2011).However, word order related to grammatical connectivity varies across languages (Lehmann, 2011), and word order changes throughout the history of languages (Gell-Mann and Ruhlen, 2011;Maurits and Griffiths, 2014).For instance, these days Subject-Object-Verb (SOV) is the canonical word order of a subordinated declarative sentence introduced by a connective in Dutch (Jordens, 1988;Koster, 1975), while English has a fixed SVO structure (Comrie, 1981) and Spanish can have either of these (López Meirama, 2006).These syntactic structures might be different for future generations of Dutch, English or Spanish speakers.Thus, both cross-linguistic and historical-grammar factors await an exploration to further our understanding of the use of connectives as signifying patterns of speech (dis)organization in patients with SSD.

Conclusions
Connectives' features are informative and explainable variables that can be used to reliably assess disorganized speech in patients with SSD.The combination of this method with other linguistic components is a promising venue to further improve accuracy in categorizing individuals with SDD and control participants.Such fine-tuned automatic analyses of speech samples will help to reach the ultimate aim of advancing clinical practice.

Fig. 2 .
Fig. 2. Mean proportion of use for each type of connective per group.

Table 1
Details of the different features used for the classification tasks.
(continued on next page) H.Corona-Hernández et al.

Table 2
Demographic characteristics of participants.
n = sample size, SD = standard deviation.aData available for all control participants, but only for 49 patients.b Data available for 44 patients alone: two patients were not taken antipsychotic medication, and, for four patients, there was no information about dose equivalence.

Table 3
Characteristics of the speech produced by the participants per group.
n = sample size, SD = standard deviation.

Table 4
Stepwise procedure followed to obtain the logistic model that best fitted the data.

Table 5
Fixed-effects factors in the model that best fitted the data on proportions of connectives.

Table 6
Random-effects parameters in the best model fitted to proportion of connectives.

Table 7
Differences in connectives-related similarity measures between groups.

Table 8
Comparison of classification performance between connectives' features and non-connectives similarity.