Evidence for the factor structure of formal thought disorder: A systematic review

A


Introduction
Formal thought disorder (FTD) is a broad umbrella term describing a range of thinking, speech and communication disturbances (Andreasen, 1986;Kircher et al., 2014;Liddle et al., 2002;Solovay et al., 1986).FTD symptoms are indicative of schizophrenia (Roche et al., 2015a), as well as other psychiatric and developmental disorders, including bipolar disorders (Keyes et al., 2013), depression (Kircher et al., 2014), personality disorders (Kircher et al., 2014) and autism (Solomon et al., 2008).FTD has been associated with increased clinical severity in psychosis and linked to poorer prognostic outcomes, such as reduced occupational (Racenstein et al., 1999) and social functioning and quality of life (Bowie and Harvey, 2008;Tan et al., 2014), increased rate of hospitalisations and lower employment (Wilcox et al., 2012), as well as poorer prognostic outcomes (Roche et al., 2016).
FTD has long been recognised as non-unitary (Cuesta and Peralta, 1999;Roche et al., 2015b;Tan and Rossell, 2019b), with clinicians noting heterogeneous presentations across individuals.This is also reflected in the scales used to measure these clinical phenomena (Andreasen, 1986).Within this literature, the dichotomy of positive and negative symptoms to classify psychosis have been similarly applied to FTD.Negative FTD refers to underproductive or dysfluent speech, usually evident as a reduced amount of speech or speech content below what would be considered contextually appropriate (Andreasen, 1984a;Kircher et al., 2014).Positive FTD refers to impairment with conveying information due to inappropriate or bizarre qualities of produced speech, and excludes diminished fluency or productivity (Andreasen, 1984b;Radanovic et al., 2013).As with psychosis more broadly, these distinctions emerged as theoretical suppositions; however, the dimensional composition of FTD is still unclear.
One empirical approach adopted to understand the clinical features of FTD is factor analysis (FA).FA reduces multiple related measurements to a smaller number of latent dimensions.There have been various FA studies that have aimed to identify the latent structure of FTD using exploratory and confirmatory FAs, as well as principal component analysis (Andreasen, 1986;Barrera et al., 2008;Cuesta and Peralta, 1999;Liddle et al., 2002;Roche et al., 2015a).Notably, a recent Delphi study showed very little agreement among experts regarding the number and nature of factors underpinning FTD (Zamperoni et al., unpublished data, 2023).This lack of consensus might reflect methodological differences across the empirical studies, which could contribute to varying results.Sample-specific differences in aspects such as illness stage, illness severity and comorbidities, as well as FTD measures used, would intuitively impact on the dimensions found, although this literature has not yet been thoroughly reviewed.
Determining the structure of FTD is important for understanding its prognostic and treatment correlates.For example, some manifestations might be more indicative of poorer outcomes than others (Roche et al., 2016).The current study aims to investigate dimensions of FTD across measures and statistical techniques employed, and illness stage.A review of FA studies is timely to determine if an overarching model can be identified for empirically defining the key constructs of FTD, and associations between individual clinical symptoms and each dimension.

Search strategy and inclusion criteria
This review was conducted following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA; Page et al., 2021) guidelines.As per the review protocol (Zamperoni et al., 2023), a comprehensive search of the literature was conducted using PsycINFO, PubMed and Web of Science on January 17th, 2023, with the following search terms: ("Thought Disorder" OR "Formal Thought Disorder" OR "language" OR "speech") AND ("factor analysis" OR dimension OR "Principal Component Analysis") AND ("psychosis" OR schizophrenia OR Bipolar OR "First Episode").One reviewer (GZ) conducted the title/ abstract screen, and two reviewers (GZ and PJS) conducted subsequent full-text assessment.Disagreements were resolved by discussion with a third reviewer (EJT) until consensus was reached.Reference lists of included papers were scanned for additional relevant work.
Studies were included if they met the following criteria: 1. the article was written in English; 2. the article was published in a peer-reviewed journal; 3. the article was an original empirical investigation (metaanalyses/reviews/case studies were excluded); 4. the article used a valid dimension reduction technique, including exploratory and/or confirmatory FA; 5. the participant sample consisted of adults ≥ 18 years, or with a mean age ≥ 18 years; 6. the participant sample included individuals diagnosed with a psychotic disorder by a clearly defined criteria, such as any version of the Diagnostic and Statistical Manual (DSM) and/or International Classification of Diseases (ICD) and/or Research Diagnostic Criteria (RDC).Psychotic disorders included schizophrenia spectrum disorders and bipolar disorders, across multiple illness stages.Other non-psychotic diagnoses, such as major depressive disorder (MDD), were included if psychotic features were confirmed to be present; 7. the aims of the analysis were to explore dimensions within the construct of FTD using a multi-item FTD measure.Any studies that utilised broad symptom scales or depended on global FTD measures were excluded.Analyses that included measures of other symptoms or syndromes were also excluded.Inclusion criteria predominantly remained as per the protocol.The authors made minor changes to the wording of three inclusion criteria from protocol conceptualisation that did not alter the original scope of inclusion (see Supplementary Table 1).

Data extraction
Data extraction was compiled by a single reviewer (GZ), in conjunction with three other reviewers (PJS, EJT, SLR).Three types of data were extracted: sample demographics and clinical information (including sample size; age; sex; years of education; current psychiatric diagnosis; current medications; FTD presentation), study methodology (including type of FA used; method of rotation; adequacy of data for FA; fit indices; method used to determine number of factors; percentage of variance explained; measures used; method of eliciting speech), and dimensions identified (including the number of factor dimensions; factor dimensions labels; composition of factor dimensions).

Quality assessment
The quality of studies was compiled by a single reviewer (GZ) in conjunction with all other authors (PJS, EJT, SLR, DM) in terms of the assessment of FTD and the FA completed.The quality of FTD assessment was rated according to four criteria: 1. Did the authors account for low prevalence of FTD items in analyses?2. Was the training of assessors recorded?3. Was inter-rater reliability adequate (≥0.70;Chaturvedi and Shweta, 2015)? 4. Was a standardised method of eliciting speech employed?Standardised methods for eliciting speech included asking participants to talk about presented images within a certain timeframe.Examples of stimuli that have been used include the Thematic Apperception Test (Murray, 1943) and Rorschach's inkblots (Rorschach, 1942).Various authors recommend the use of such standardised methods to ensure consistent opportunities to display FTD across individuals (Liddle et al., 2002).Unstandardised methods include unstructured and semi-structured interviews, where the overall amount of speech produced is dependent on the nature of the initial responses provided.One point was awarded for each criterion met and these points were summed to produce a quality rating percentage for each study's assessment of FTD.Using validated measures was also an important component of the quality analysis but was not considered as it was already a criterion for study inclusion.
FA quality assessment was rated according to the following five criteria: 1. Did the sample size meet best practice 1 requirements (Kyriazos, 2018; Worthington and Whittaker, 2006)? 2. Was the factorability of the data verified via Bartlett's test of specificity or Kaiser-Meyer-Olkin (KMO)? 3.For PCA/EFA, was criteria determined for factor retention (e. g. parallel analysis)?4. Were fit indices reported? 5. Were reliability coefficients for factors reported and reached ≥0.70 (Watkins, 2018)?As not all quality criteria related to both CFA and EFA/PCA, one point was awarded for each criterion met and then divided by total relevant criteria to produce a mean score converted into a percentage.

Study selection
The initial search strategy returned 1171 records after the removal of duplicates.Of these, 1051 were excluded after title/abstract screening.Full texts were evaluated for the remaining 120 articles, with 13 satisfying all of the inclusion criteria.The reference lists of relevant records were hand-searched for further relevant studies.Of these 12 additional studies, three met inclusion criteria, for a final total of 16 articles (see Fig. 1).Inter-rater reliability for full-text review was considered almost perfect, with Cohen's k = 0.82 (Landis and Koch, 1977).

Study characteristics
Study characteristics are presented in Table 1.A total of 3071 patients with a psychotic disorder were included, with the mean sampled age ranging from 22 to 51 years.The sample was characterised predominantly by individuals with a schizophrenia spectrum disorder (n = 1 Best practice requirements for EFA included meeting minimum sample size (n = 100) and then evaluating the need for more data based off factor loadings and communalities.Worthington, R.L., Whittaker, T.A., 2006   Note.std.= standardised; unstd = unstandardised.SZA = Schizoaffective disorder.SZ = Schizophrenia.Mixed refers to both in-patient and out-patient settings of recruitment.SZP = Schizophreniform disorder.NOS = Not otherwise specified.GFI = goodness of fit.RMSR = Root Mean Square Residual.NS = Not specified.Percentages listed for sex were rounded to the nearest whole number.FEP = First Episode Psychosis.Age is reported as mean years of age.Sz "core" = chronic and deteriorated illness with no past or present affective disorder.Sz "noncore" = nonaffective and chronic psychosis with positive symptoms, often with prolonged hospitalisations.
G. Zamperoni et al. 2411; 79 %) and, out of these, 2117 (88 %) had a diagnosis of schizophrenia, 152 (5 %) of the sample were diagnosed with bipolar disorder, 146 (5 %) had major depressive disorder with psychotic features and 362 (12 %) of the sample had 'other' diagnoses (see Table 1 for other diagnoses).The sample comprised 1684 (59 %) males and 1155 (41 %) females, with one study (6 %) not reporting on sex.Most samples were from in-patient settings (n = 8; 50 %) or combined in-patient/outpatient settings (n = 7; 44 %), with only one study (6 %) reporting solely on outpatients.Where relevant data was reported, study samples overall had between 8 and 13 years of education on average.Two (13 %) studies included participants experiencing first episode psychosis (FEP), six (38 %) studies included mixed samples of both acute and chronic (and/or post-acute and/or clinically stable) patients, two (13 %) studies reported samples comprising chronic patients and three (19 %) studies reported samples comprising acute patients, while the remaining studies (19 %) did not specify illness stage.Aside from studies that either assessed FEP (n = 2; 13 %) or did not report medication data (n = 3; 19 %), most participants were taking some form of antipsychotic medication.Various measures were used to determine the prevalence and severity of FTD within each sample and, for each measure, various scoring approaches were adopted.Consequently, it was not possible to characterise the amount of FTD in each sample and across samples, but it is important to note most people in each sample reportedly demonstrated FTD, however conceptualised.
Table 1 also includes information pertaining to each FA.Some studies reported multiple factor solutions or CFA models with goodnessof-fit statistics.We have thus synthesised the data, where appropriate, across solutions found.This meant a total of 39 FAs across 16 studies were synthesised.
Within the three studies which investigated CFA there were 20 models compared, making this the most common type of FA used.Within the eleven studies which adopted PCA, a total of 14 solutions were reported.Out of these, 11 analyses (79 %) employed a varimax rotation, two (14 %) used oblique rotation methods and one (7 %) did not employ any rotation method.Within the three studies that used EFA, a total of five solutions were reported.Out of these, three (60 %) used principal axis factoring with a promax rotation, one (20 %) used maximum likelihood with oblimin rotation and one (20 %) used promax rotation.
The Thought Language and Communication scale (TLC; Andreasen, 1986) was the most popular method for rating speech, with 21 analyses (54 %) either using the whole 18-item measure (n = 13; 62 %), specific items from the measure (n = 5; 24 %), items in conjunction with items from the Scale for the assessment of positive symptoms (SAPS; Andreasen, 1984b) and the Scale for the assessment of negative symptoms (SANS; n = 1; 5 %; Andreasen, 1984a) or a translated version of the measure (n = 2; 10 %).Out of the remaining analyses, twelve (31 %) either used the full FTD 12-item SAPS/SANS (n = 7; 58 %) or specific items from the measure (n = 5; 42 %), while four studies (10 %) each introduced a new measure of speech including the Thought and Language Index (TLI; Liddle et al., 2002), the Schizophrenic Communication Disorder Scale (SCD; Bazin et al., 2005), the Clinical Language Disorder Rating Scale (CLANG; Chen et al., 1996) and the Communication Disturbances Index (CDI; Gordinier and Docherty, 2001).One study (5 %) introduced two new measures (Barrera et al., 2008), including a self-report measure (FTD-patient scale) and an observerrated measure (FTD-carer scale).
All but two studies (13 %) specified their method of eliciting speech; with nine (56 %) and five studies (31 %) using unstandardised (e.g.unstructured or semi-structured) and standardised techniques respectively.A total of seven studies (44 %) specified the amount of time each participant was provided to speak during FTD assessment, with Andreasen's 45-minute TLC structured clinical interview being most popular (n = 3; 19 %).Overall, speaking time given to participants during assessment ranged from 8 to 90 min across studies.

Factor models
Across the 20 CFA analyses investigated, six models (30 %) were considered to have a "good" fit (GFI ≥ 0.95), 2 most commonly consisting of a three-factor model (n = 3; 50 % of all relevant analyses which investigated three-factor models), followed by a two-factor model (n = 2; 29 % of all relevant analyses which investigated two-factor models), and a four-factor model (n = 1; 100 % of all relevant analyses which investigated four-factor models).Within this, one three-factor model explained the greatest variance in the most parsimonious way compared to all other models across the studies, and when compared with other two and four factor models in the same analysis (GFI = 0.99, RMSR = 0.07; Roche et al., 2015b).It should also be noted studies investigating two factor structures generally analysed fewer items compared to studies which found more factors.Overall, poor fit indices were observed in analyses testing both the null model (i.e.investigating whether no underlying structure existed for FTD; GFI < 0.5) and the unidimensional model (GFI < 0.9).Taken together, this suggested FTD is best conceptualised as having some underlying construct that is most likely multidimensional, with a three-factor structure having the strongest support.

Three-factor sample characteristics and model item composition
Across analyses that investigated a three-factor model of FTD (n = 13), samples (total N = 3735) consisted predominantly of males (55 %) with a diagnosis of schizophrenia, from either acute or chronic stages of the illness, or FEP.Patients were either taking antipsychotic medication or were relatively medication free.Speech was most commonly rated using either the TLC (Andreasen, 1986) scale or the SAPS and SANS (Andreasen, 1984a(Andreasen, , 1984b)), with between 8 and 18 items included across all analyses, using either standardised or unstandardised methods of eliciting speech.Where specified, mean years of education ranged from 8 to 13 years.Mean age ranged from 22 to 46 years (see Supplementary Text 1.1 for each model's sample characteristics).
In terms of factor and item composition, results were somewhat consistent for the first two factors.A factor most referred to as disorganisation emerged most consistently, appearing in all except one three-factor PCA/EFA/CFA model with GFI > 0.95 (n = 9; 90 %), and most commonly as the first factor (n = 6; 67 %) in relevant models.For this factor, items generally consisted of speech qualities which appeared to be confusing to the listener relative to the context of use, including functioning without order/systematisation.Where relevant, the highest item loadings (>0.8) for this factor included confused references, 2 Although GFI values are not the only way of assessing the overall fit of a CFA model, GFI has been synthesised here considering it was the most reported index across all analyses.GFI classifications are based on: Hu, L.t., Bentler, P. M., 1999.Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.Structural equation modeling: a multidisciplinary journal 6(1), 1-55.
G. Zamperoni et al. structural unclarities, loss of goal, derailment, tangentiality and incoherence.Items which loaded less strongly included poverty of speech, pressured speech, illogicality, perseveration, blocking, ambiguous word meanings, looseness, peculiar word use, peculiar sentence construction, peculiar logic, clanging, word approximations, neologisms and echolalia.Notably, clanging, echolalia and neologisms generally loaded much lower than other items across the analyses.Similarly, CFA models with GFI > 0.95 included the items of derailment, tangentiality, circumstantiality, incoherence, illogicality, distractible speech and pressure of speech for this factor.Within relevant analyses, this factor tended to account for most of the total variance within the model over and above the combined variance of the other two factors within the three-factor models.
A factor commonly referred to as negative or poverty appeared to emerge consistently across models, appearing in almost all EFA/PCA/ CFA models with GFI > 0.95 analyses (n = 8; 80 %) and most commonly as the second factor (n = 5; 63 %).Items generally consisted of speech qualities that appeared to convey a lack of meaning/communicative efficacy.The highest item loadings (>0.8) for this factor included poverty of speech, poverty of content of speech, weakening of goal, missing information references, increased latency of response and blocking.Items which loaded less strongly on this factor included tangentiality, pressure of speech, circumstantiality, echolalia, selfreferential speech and wrong word references.Similarly, CFA models with GFI > 0.95 included poverty of speech, poverty of content of speech, increased latency of response and blocking as factor items.
There was less consistency in factor and item composition for the third factor across models.In EFA/PCA analyses, factors ranged from those described as linguistic control, verbal productivity, regulation of speech, speech peculiarities, incoherence or verbosity.Notably, only the symptoms of clanging, distractibility and incoherence loaded highly (>0.8) on the verbosity, regulation of speech and incoherence factor, respectively.Similarly, variability existed within the third factor for CFA models with GFI > 0.95 that had both a disorganisation and a negative/ poverty factor.This ranged from factors coined incoherence and verbosity, and included items of pressure of speech, clanging, incoherence and illogicality across factors here.Notably, a verbosity factor was the only re-produced third factor across type of FA employed and samples assessed, appearing in two studies investigating FEP, and both included pressure of speech and clanging.

Quality assessment
The quality assessment for each study is outlined in Supplementary Table 2.One study was evaluated against quality criteria twice due to the inclusion of both PCA and CFA factor techniques, meaning a total of 17 analyses were analysed against quality criteria.The number of applicable quality criteria met across all included studies ranged between 0 % and 75 %.Specifically, most studies met 50 % of the quality criteria for the assessment of FTD, typically failing to account for the prevalence of sampled FTD in their analyses (for example, removing items with low variability).Indeed, many studies reported low prevalence for various items.Most studies employed unstandardised methods to elicit speech and most did not report whether interviewers were trained in speech assessment methods.Most studies did, however, report adequate inter-rater reliability values for the assessment of FTD.Similarly, most studies met either 50 % or 75 % of the criteria relevant to the quality of the FA conducted.Most studies had sufficient sample sizes to conduct FA and, where relevant, most studies specified appropriate factor retention criteria.While all relevant CFA studies reported fit indices, most studies did not report on the factorability of data, nor on adequate reliability coefficients for factor summed scores.

Discussion
This review systematically evaluated the findings from 16 studies examining the factor structure of FTD.The reviewed studies suggested FTD is multi-dimensional, with several studies failing to find convincing evidence of unidimensionality, and many studies reporting evidence is strongest for a three-factor model.A three-factor model appeared to be robust across different types of FA and measure used, stage of illness, method of eliciting speech and medication status, compared to other models investigated.

Three-factor structure
The nature of the first factor appeared to relate to speech qualities operating without order/systematisation, encompassing symptoms including confused references, structural unclarities, loss of goal, derailment, tangentiality and incoherence.This was most referred to as a disorganisation factor.The nature of the second factor appeared to relate to speech which lacked communicative efficacy, encompassing symptoms including poverty of speech, weakening of goal, missing information references, increased latency of response and blocking.This was most referred to as a negative or poverty factor.The nature of the third factor was less consistent across studies, with multiple item compositions and labels indicated.At times, this factor was suggested to represent the verboseness of speech, encompassing symptoms including pressure of speech and clanging.Others variously found this factor to represent the linguistic control of speech, the incoherence of speech, or the regulation of speech, to name a few.
The variability observed for the third factor might be attributed to methodological differences between studies, including the measure used and items included within the analysis.We did not find any specific trends here.Future studies should explore the factor structure of FTD across various methodologies including patient group, stage of illness, severity of illness and statistical method to clarify this third factor.Utilising the same scale here would also clarify the role of variability in measurement scale from factor solutions found.

Dimensionality of FTD
The lack of empirical support for unidimensionality is consistent with decades of clinical wisdom, including those who have subsequently used this knowledge to develop specific FTD symptom measures (Andreasen, 1986) and within broader FTD research (Docherty et al., 2011;Minor et al., 2016;Roche et al., 2016;Tan and Rossell, 2017;Tan and Rossell, 2019a).Current expert consensus within this area surveyed using the Delphi technique also supports the concept of FTD being dimensional at a clinical practice level (Zamperoni et al., unpublished data, 2023).Despite this, much of the research investigating FTD to date has used unidimensional measures using global summary scores to measure FTD (Bambini et al., 2020;Kerns and Berenbaum, 2002;Kiefer et al., 2009;Roche et al., 2015b;Tan and Rossell, 2014;Tan et al., 2014).Some studies have also investigated FTD by individual symptoms (Docherty et al., 2011;Tan and Rossell, 2017;Tan and Rossell, 2019a) and by dimensions (Holshausen et al., 2014), finding evidence of mechanisms that were not found using unidimensional measures.This has contributed to heterogeneity in etiological and mechanistic FTD research leading to still limited clinical interpretability (Roche et al., 2015a).This may explain why FTD is recognised as being a poorly understood clinical construct among professionals in psychiatry more broadly (Zamperoni et al., unpublished data, 2023).Indeed, no consensus clinical guidelines for the assessment and diagnosis of FTD exist.Acknowledging the dimensionality of FTD is crucial for clinicians to better grasp the manifest pathology and its prognostic utility among their patients.For researchers, it enhances symptom characterization and improves the sensitivity of mechanistic and prognostic investigations.Our findings of a three-factor structure can provide a framework for the improved understanding of FTD phenomenology, which could lead to improved assessment.
A two-factor model has been commonly applied to FTD, which has established utility in both mechanistic and clinical research (Bora et al., 2019;Minor et al., 2016;Ott et al., 2002;Sumner et al., 2018).Despite this, no study within our review supported this dichotomy exclusively (Harvey et al., 1992).We did find a two-factor model was the second most commonly identified FTD structure investigated; however, the exact labels and item compositions were generally not reproducible across analyses or certain methodological variables (e.g.measures other than items from the SAPS; Andreasen, 1984b;and SANS;Andreasen, 1984a), and those that were either did not possess acceptable GFIs or did not include enough items in the analysis to permit the extraction of more than two factors (Roche et al., 2015b).

Quality of evidence
The quality analysis of included studies revealed several limitations.Firstly, studies generally did not report whether the low prevalence of FTD items was considered in analyses, with many studies citing low scores, particularly for items including echolalia, clanging and neologisms.As FA relies on having a variation across options available for each item, items that do not have adequate variation in scores should generally not be considered.This is because inadequate variation does not provide sufficient differentiation between respondents and could affect item loadings on factors and reliability (Watkins, 2018).Notably, only a few studies, and even less in those that found evidence of a threefactor structure, had issues of cross-loadings or low factor loadings, suggesting this probably did not affect the results substantially.Additionally, the number of factors found did not systematically vary with rotation method used.Future studies should consider how the prevalence of FTD, as well as specific rotation methods found can affect factors.This is a difficult challenge moving forward and emphasises big data approaches to increase chances of sampling such rare FTD manifestations and employment of a variety of rotation methods, among other benefits including adequate sample size required for FA.
The variation of methods used to elicit speech across studies can result in different amounts of expressed FTD and may produce inconsistencies in factor solutions found.It has been suggested unstandardised methods of eliciting speech allow more opportunities for FTD to occur and, within this, the topic of conversation may be influential (Liddle et al., 2002;Solovay et al., 1986).For example, emotionally laden content could affect speech production more than neutral content (Docherty et al., 1998).Additionally, there are differences between eliciting free speech via patient monologues, e.g. using picture description tasks (Liddle et al., 2002), personal narratives (Willits et al., 2018), free conversations (Tan et al., 2021), or structured clinical interviews (de Sousa et al., 2015;Kay et al., 1987).Future studies should attempt to include standardised methods to reduce issues with inconsistent factor solutions found.Other authors have previously called upon an internationally agreed standardised speech elicitation measure to assist with consistency (Tan and Rossell, 2019b).
A broader issue then relates to longstanding disagreements among clinicians and researchers with the concept of FTD.Indeed, various terms have been used to describe roughly similar phenomena (Rule, 2005).Consistency of FTD training has also been highlighted as a primary issue among experts in the field (Zamperoni et al., unpublished data, 2023).In the current review, FTD assessment training was not consistently reported among the included studies, although adequate reliability was often reported.The lack of a universal FTD training and diagnostic system potentially contributes to further heterogeneity and hampers consensus in the field.

Caveats and future research
Included studies were generally limited to schizophrenia samples.When studies did include other diagnoses, it was usually a small percentage, and data were usually combined within analyses which limited the ability for dimension comparison.Hence, despite our aim of investigating the transdiagnostic factor structure of FTD in psychosis, the outcome is still largely unknown.Given past data has suggested FTD presents differently across patient groups (Andreasen and Grove, 1986;Roche et al., 2015a;Yalincetin et al., 2017), population differences may have affected dimensions found within the present review.In saying this, our results are the first attempt to synthesise the factor structure of FTD transdiagnostically and provide preliminary evidence for a threefactor structure within bipolar disorder and depression, as well as schizophrenia.Three-factor models were also observed in FA studies of self-reported FTD measures in non-clinical samples (Barrera et al., 2015;Sumner et al., 2022).Future research should explore the factor structure of FTD in various patient groups, particularly within bipolar disorder and depression populations to confirm the preliminary evidence suggested here.Additionally, separating analyses by diagnosis and matching for variables such as type and length of medication usage and stage of illness will permit examinations of the consistency or differences within latent dimensions across patient profiles.Findings can then influence future mechanistic research and the development of targeted treatment interventions.Previous linguistic analysis of speech output within these patient groups (Lott et al., 2002) suggest potentially variable factor structures.Recent advancements in quantitative methods for speech analysis have propelled the number of studies investigating speech and psychosis over the past decade (see, Tan et al., 2023).Such techniques offer the potential for heightened objectivity, sensitivity and reliability that current clinical rating scales may lack, though these methods are still relatively new, and their prognostic and clinical utility remains to be established.
Included studies were limited to predominantly English-speaking samples.This could limit the generalisability of findings crossculturally, considering linguistics is rooted within culture, and FTD classification can be biased by the sociolinguistic and cultural factors of the examiners themselves (Palaniyappan, 2021).In saying this, almost half of the studies which supported a three-factor model were not exclusively English speakers, and included other languages such as Greek, Korean, Spanish and Turkish, suggesting preliminary evidence a three-factor FTD structure is robust across languages.This also included factor consistency across the English, Greek and Korean versions of the TLC scale.More FTD FAs across cultures are required to extrapolate consistencies or differences here.
Overall, our findings demonstrated the prominence of a three-factor structure of FTD, and consistently revealed evidence of a disorganisation dimension and a negative dimension.Future research should clarify the exact nature of the third factor across patient groups and adopt better methodologies to determine a stable and consistent concept of FTD.Clarifying the third factor of FTD will allow for the development of a consensus and standardised measurement system, which currently does not exist, but is desired by clinicians and researchers (Zamperoni et al., unpublished data, 2023).Better measurement will permit the development of more personalised treatment models based on individual patient presentations, none of which have been devised and empirically tested for FTD specifically.
SLR (GNT1154651) is in receipt of a Senior Research Fellowship from the National Health and Medical Research Council of Australia (NHMRC).GZ is supported by an Australian Government Research Training Program Scholarship.All funding sources had no role in the study design, collection, analysis or interpretation of the data, writing the manuscript, or the decision to submit the paper for publication.CRediT authorship contribution statement Georgia Zamperoni: Conceptualization, Data curation, Formal analysis, Methodology, Writingoriginal draft, Writingreview & editing.Eric J. Tan: Conceptualization, Methodology, Writingreview

Table 1
Summary statistics of studies and factor analyses synthesised in the present review.