Selection bias in clinical studies of first-episode psychosis: A follow-up study

Objectives: Selection bias is a concern in studies on psychotic disorders due to high dropout rates and many eligibility criteria for inclusion. We studied how representative the first-episode psychosis study sample in the Turku Early Psychosis Study (TEPS) was. Methods: We screened 3772 consecutive admissions to the clinical psychiatric services of Turku Psychiatry, Finland, between October 2011 and June 2016. A total of 193 subjects had first-episode psychosis and were suitable for TEPS. Out of 193 subjects, 101 participated (PA) and 92 did not participate (NPA) in TEPS due to refusal or contact problems. We retrospectively used patient register data to study whether NPA and PA groups differed in terms of clinical outcomes during 1-year follow-up. Results: In overall sample, the NPA group had a significantly higher rate of discontinuation of clinical treatment than the PA group (48.9 % vs 29.7 %, p = 0.01). In the hospital-treated subsample chi-square tests did not indicate statistically significant differences between the NPA and PA groups in the rate of involuntary care (69.7 % vs 62.7 %, p = 0.34), coercive measures (36.0 % vs 22.7 %, p = 0.06), and readmissions during the follow-up (41.5 % vs 33.8 %, p = 0.31), respectively. Conclusion: The differences in clinical outcomes and treatment characteristics in the non-participating and participating groups were relatively modest. The results do not support a major sample selection bias that would complicate the interpretation of results in this first-episode psychosis study.


Introduction
Selection bias is a major concern when interpreting scientific study results and how well the study sample represents the real-life clinical setting (Schulz and Grimes, 2002;Tripepi et al., 2010). Selection bias and sample representativeness are important factors to consider when the results are used to guide clinical practice. Selection bias is especially challenging in studies on patients with psychosis and schizophrenia, not only because of the high dropout rates during the studies but also because of the dropouts in different stages before recruiting (Hofer et al., 2000(Hofer et al., , 2017Rybin et al., 2015). It has been previously suggested that consenting and nonconsenting processes during recruitment can lead to biased samples in schizophrenia studies, but the meaning of the bias to sample representativeness remains unclear (Spohn and Fitzpatrick, 1980;Bowen and Barnes, 1994).
Previously, selection bias and sample validity in psychosis and schizophrenia studies have been evaluated in pharmacological and clinical psychosis disorder studies by comparing different socioeconomic and illness-related variables between participating and nonparticipating groups. Results from these studies vary from being only minor, clinically insignificant, differences to differences that indicate a more severe illness course among the nonparticipating group (Bowen and Barnes, 1994;Rabinowitz et al., 2003;Woods et al., 2000;Lally et al., 2018;Golay et al., 2018;Haapea et al., 2007;Riedel et al., 2005). These inconclusive results still lead to the question of how well psychosis and schizophrenia study samples represent the whole patient group. Only one study by Golay et al. (2018) studied selection bias specifically among early psychosis subjects in a clinical study. The study concluded that participating and nonparticipating groups did not differ significantly from each other, but the results needed to be replicated (Golay et al., 2018).
In our study, we tested the hypothesis that patient samples in clinical first-episode studies are biased in a way that nonparticipating subjects would differ in clinical terms compared to the participating group and thus would represent a more severe phenotype. We used the Turku Early Psychosis Study (TEPS) as an example. TEPS is a clinical multidimensional follow-up study on the etiology and treatment of psychotic disorders between October 2011 and December 2018 in a single clinical organization (Salokangas et al., 2021;Armio et al., 2020). We studied selected clinical outcomes and treatment characteristics of participating (consenting) and nonparticipating (not consenting) subjects in TEPS. Nonparticipants were defined as first-episode psychosis (FEP) patients potentially eligible for the study (intent to study) but not participating due to refusal or contact problems. The study included a 1-year clinical record-based follow-up provided by the clinical organization.

Study groups and ethical assessment
TEPS was approved by the Ethics Committee of the Hospital District of Southwest Finland in 2011 (approval number TEPS ETMK 64/180/ 2011). A detailed description of TEPS is given in Salokangas et al. (2021). Briefly, TEPS is a clinical outcome study on early psychosis, including biomarker/brain MRI parts. Subjects were recruited from a single organization, Turku Psychiatry (in May 2017, Turku Psychiatry was fused with the Turku University Hospital organization). In 2017, TEPS applied for a retrospective register study permission to explore the representativeness of the final TEPS sample since the TEPS results are used to develop clinical care. The Ethical Committee of the Hospital District of Southwest Finland guided the process, and the Psychiatry division at Turku University Hospital (Hospital District of Southwest Finland) granted this permission (study number T200/2017). The application was also reviewed by the Office of the Data Protection Ombudsman in Finland. This part of the study explored register level data of subjects who, according to the original clinician questionnaire, were eligible to participate (intent-to-study group) but did not take part in TEPS (NPA group). The NPA subjects were not contacted in any way, and subsequently the NPA group data was anonymized.
All patient admissions in the units responsible for treating FEP (three inpatient units, one day hospital unit, and six outpatient units) were evaluated between October 2011 and December 2018. All patients treated in Turku Psychiatry were evaluated similarly, and the group represented a naturalistic typical clinical sample of patients. Eligibility to the study was evaluated by the clinical team using a short questionnaire and a nonstructured best estimate done by the clinician (assessment as usual, Supplement 1).
Nonparticipating group (NPA group): 199 subjects were not further contacted in any way. Anonymized clinical treatment characteristics were explored in the electronic medical records as a register study. Out of the 199 subjects, 65 subjects, according to electronic medical records, had exclusion criteria or had insufficient identification data in the eligibility questionnaire (Fig. 1). Out of these 65 subjects, 10 fulfilled the criteria for substance dependence according to the ICD-10 during our follow-up time. Out of the remaining 134 subjects, 92 were categorized as FEP subjects and 42 as clinical high-risk psychosis subjects according to the eligibility questionnaire filled out by the medical team (Supplement 1) (Fig. 1).
According to the eligibility questionnaires made by the clinical team, there were 3772 admissions in Turku Psychiatry between October 2011 and June 2016 (timeline between October 2011 and June 2016 was chosen because Turku Psychiatry had organization fusion in 2017 and fusion might have changed some treatment protocols). All of the questionnaires were explored by the research coordinator, who inspected potentially eligible subjects. Subjects were potentially eligible if, according to the clinical team, they were suffering from psychotic episode or were at clinical high risk for psychosis and had never had a psychotic episode before (Supplement 1). Subjects between 18 and 50 years old were suitable for the study. Of these 3772 subjects, 417 subjects formed the intent-to-study group. Out of these 417 subjects, 218 subjects were potentially eligible for the study and were willing to participate, and 199 subjects were similarly eligible for the study but were not willing to participate in the study or were not reached ( Fig. 1).
Participating group (PA group): 218 consenting subjects went through the initial screening to see if they fulfilled the study's inclusion criteria. TEPS exclusion criteria were known intellectual disability, previous head trauma, neurological brain disorder, and alcohol or substance dependence. Out of the 218 subjects, 37 subjects either did not fulfill the inclusion criteria or dropped out before entering the study procedure. Subsequently, 181 subjects attended the study (Fig. 1). A total of 181 attending subjects were interviewed with structured clinical interviews, such as the Structured Clinical Interview for DSM (SCID) (First et al., 2015) and the Structural Interview of Prodromal Symptoms (SIPS) (Miller et al., 1999). According to interviews, the subject group included 101 FEP subjects, 40 subjects who were at clinical high risk for psychosis, and 40 subjects who were classified as a separate risk patient group with an estimated clinical psychosis risk but not fulfilling the SIPS/SOPS criteria (patient controls). Finally, we included these 101 subjects with FEP in the participating group ( Fig. 1). Psychosis risk or patient control data is not reported here. Nonparticipating group (NPA group): 199 subjects were not further contacted in any way. Anonymized clinical treatment characteristics were explored in the electronic medical records as a register study. Out of the 199 subjects, 65 subjects, according to electronic medical records, had exclusion criteria or had insufficient identification data in the eligibility questionnaire ( Fig. 1). Out of these 65 subjects, 10 fulfilled the criteria for substance dependence according to the ICD-10 during our follow-up time. Out of the remaining 134 subjects, 92 were categorized as FEP subjects and 42 as clinical high-risk psychosis subjects according to the eligibility questionnaire filled out by the medical team (Supplement 1) (Fig. 1).
Alcohol or substance dependence was an exclusion criterion in our study. Based on this criterion, there were 4 subjects in the PA group and 10 subjects in the NPA group who had a register-based diagnosis of alcohol or substance dependence.
In the end, we had 101 FEP subjects, which we categorized into a participating group (PA), and 92 first-episode subjects, which we categorized into a nonparticipating (NPA) group (Fig. 1).

Clinical variables
We set a hypothesis of selection bias that the NPA group would represent a more severe phenotype in terms of available clinical outcomes than the PA group. We chose clinical variables as independent variables in 1-year follow-up, which we thought would characterize these differences between two groups and were also important in clinical care. These variables were the length of the first hospital treatment, the treatment on the basis of either involuntary or voluntary, and whether coercive measures were used during the hospital treatment. Isolation, involuntary medication, and limb restraints were counted as coercive measures. Involuntary medication was considered to address antipsychotic medication use. We also collected information about the subjects' discontinuation of clinical treatment during the follow-up. Discontinuation of clinical treatment was due to a variety of reasons: moving to another place of residence, nonadherence to treatment, or discontinuing the treatment via mutual consent with the clinician. We also collected information about recurrent hospitalization during the follow-up. We hypothesized that the nonparticipating group had more often discontinuation of clinical care, more discontinuation of clinical care without agreement with the clinician, longer hospital treatment, higher rate of coercive measures, and higher rate of re-hospitalization. Data on gender, age, marital status, level of education, and recruitment settings were also collected and used as independent baseline variables. The dependent variable was group status (participating or nonparticipating).

Statistical analysis
The data were analyzed using SPSS software (25.0 for Windows). p-Values below 0.05 were considered statistically significant.

Baseline characteristics
The differences in baseline variables between the PA and NPA groups were tested using the chi-square test. Student's t-tests were used to test continuously distributed variables.

Treatment discontinuation
The differences of discontinuation of clinical treatment and discontinuation without agreement variables between the PA and NPA groups were tested using the chi-square test.
Study participation was explained in binary logistic regression. Coefficients were estimated, confidence intervals were calculated using standard errors, and statistical significance was tested using Wald test statistics.

Hospital treatment
The differences in clinical variables between the PA and NPA groups were tested using the chi-square test statistics.
Study participation was explained in binary logistic regression analysis using dependent variables with an arbitrary p-value of <0.2 in univariate analyses. Binary logistic regression was carried out for hospitalized patients. Coefficients were estimated, confidence intervals were calculated using standard errors, and statistical significance was tested using Wald test statistics.

Baseline characteristics
Sex, age, marital status, and level of education did not differ statistically significantly between the PA and NPA groups ( Table 1).
The majority of the study recruitment settings took place when the subjects were in hospital treatment, i.e., 75 (74.3 %) of the 101 subjects in the PA group and 89 (96.7 %) of the 92 subjects in the NPA group. The difference was statistically significant (p < 0.01). When comparing the PA and NPA groups recruited from the hospital, there were no  statistically significant differences in sex, age, marital status, or level of education (Table 2).

Treatment discontinuation in the whole sample
Chi-square tests indicated that discontinuation of clinical treatment within 1 year was significantly more common in the NPA group (48.9 %, n = 45) than in the PA group (29.7 %, n = 30) (p < 0.01). Discontinuation of clinical treatment without agreement with the clinician occurred in 13.3 % (n = 4) of the PA group and 33.3 % (n = 15) of the NPA group (p = 0.051) ( Table 3).
Binary logistic regression was carried out only for discontinuation of clinical treatment since discontinuation treatment without agreement was part of the discontinuation of clinical treatment variable. The binary logistic regression indicated that patients in the NPA group were more likely to discontinue clinical treatment (OR 2.27, 95 % CI 1.26-4.09).

Treatment characteristics in the hospital-treated subsample
Hospital treatment variables were only discovered in study subjects who were asked to take part in the study during their first hospital treatment due to FEP, i.e., 75 subjects in the PA group and 89 subjects in the NPA group. The length of the hospital treatment was divided into three categories: 0-30 days, 31-90 days, and over 90 days. The length of the first hospital treatment in the PA and NPA groups was statistically significant (p = 0.04) ( Table 2). The majority of the PA group (61.3 %) spent 31-90 days in the hospital, while the subjects in the NPA group spent on average a shorter period, i.e., 0-30 and 31-90 days (42.7 % and 41.6 %, respectively).
The hospital treatment was involuntary in 62.7 % (n = 47) of the PA group subjects, whereas the number in the NPA group was 69.7 % (n = 62) (p = 0.34). Coercive measures during the treatment were more common in the NPA group (36.0 %, n = 32) than in the PA group (22.7 %, n = 17), but the difference was not statistically significant (p = 0.06). Involuntary medication as one form of coercive measure and as an indication of the use of antipsychotics occurred in 17.3 % (n = 13) of the PA group and 27 % (n = 24) of the NPA group (p = 0.14). The discontinuation of clinical treatment was more common in the NPA group (n = 44, 49.4 %) than in the PA group (n = 26, 34.7 %), but the difference was not statistically significant (p = 0.06) ( Table 4).
Recurrent hospital treatment analysis was applied only to subjects who stayed in treatment for the 1-year follow-up time or who discontinued their clinical treatment before the follow-up time but had a recurrent hospital treatment before discontinuation i.e. 74 subjects in the PA group and 53 in the NPA group. The recurrent hospital treatment period occurred in 33.8 % (n = 17) of the PA group and 41.5 % (n = 22) of the NPA group during the 1-year follow-up. The difference was not statistically significant (p = 0.31).
The binary logistic regression analysis model involved dependent variables, i.e., length of hospital treatment, coercive measures, and discontinuation of treatment. Involuntary medication was included in "coercive measure". Binary logistic regression showed an OR of 2.18 for coercive measures (95 % CI 1.04-4.57, p = 0.04) and an OR of 2.61 for the 0-30 days of hospital stay (95 % CI 1.23-5.50, p = 0.01) in the NPA group. Discontinuation of clinical treatment did not reach statistical significance in this model (OR 1.47, 95 % CI 0.75-2.90; p-value 0.27) ( Table 5).

Discussion
Considerations of sample selection are important when we evaluate how research study results are generalizable to everyday clinical practice (Schulz and Grimes, 2002). In this study, we tested the hypotheses of sample selection bias in the clinical first-episode study by comparing clinical outcomes between participating and nonparticipating subjects. It is notable that nearly half (47.7 %, 199 subjects of 421) of the intentto-study FEP patients are missing in the first place due to refusal or contact problems. Similar results have been reported in other sample bias studies before (Hofer et al., 2000;Robinson et al., 1996).
In the whole sample the NPA group subjects had a higher risk of discontinuing their clinical treatment. Nonadherence to treatment predicts a more severe illness course and worse prognosis, whereas early interventions, antipsychotic medication, and therapeutic treatments favorably affect long-term outcomes, e.g., by decreasing the rate of hospitalization, symptom severity, and mortality of psychosis patients (Correll et al., 2018;Albert and Weibell, 2019;Tiihonen et al., 2018;Á lvarez-Jiménez et al., 2011;Posselt et al., 2021). Treatment nonadherence in turn is related to many patient-and illness-related factors such as lack of insight, poor engagement with medication in the early phase, and more positive symptoms (Leclerc et al., 2015;Doyle et al., 2014). Regression analysis indicated that overall discontinuation of treatment in this study was more than two times higher in the NPA group. Such a risk increase is likely to be clinically significant.
The differences in hospital treatment characteristics in PA and NPA groups were not statistically significant although we cannot exclude a higher rate of coercive measures in the NPA group. Coercive measures are linked in previous studies to more severe symptoms at the beginning of the treatment (Kalisova et al., 2014;Fiorillo et al., 2012;Luciano et al., 2014). The length of the hospital treatment, on the other hand, does not have a clear-cut link to illness-related factors. Some studies state that a longer hospital stay indicates more severe symptomatology and poorer treatment adherence (Capdevielle et al., 2013;Hopko et al., 2001) while others indicate that a shorter time may lead to rehospitalization more frequently than a longer stay (Appleby et al., 1996;Lin et al., 2006). In our study, NPA group subjects receiving more coercive measures were prone to stay in hospital treatment for a shorter time, but this might be because they are not engaged in treatment overall and stay in the hospital treatment only as long as needed. Overall, in contrast to our starting point, the length of hospital stay after FEP may not be a useful proxy in outcome studies in FEP. The factors explaining variability in the length of hospital treatment for psychosis clearly need further research. Taken together, the analysis of hospital treatment characteristics or discontinuation of subsequent treatment in this sub-sample does not support any major differences in the clinical phenotypes of the PA and NPA groups. However, there was clearly a high variance in these treatment parameters. Thus, FEP outcome studies should carefully characterize such factors and their consequences in order to evaluate the representativeness of the study sample.
Another putative bias in FEP studies is alcohol and substance dependence, which would be important to take into account when designing studies on psychotic disorders. Dual diagnosis is becoming more common among psychotic disorders, but usually dual-diagnosis patients are ruled out in the studies, which is a clear concern in the representativeness of the study group (Hunt et al., 2018). We had 4 subjects in the PA group, which were ruled out because of alcohol or substance dependence, whereas the number in the NPA group was 10, indicating that dual patients are also prone to drop out in the early study phases. Further studies on the dual diagnosis patients are needed because the overall management of psychosis disorders is, on average, more complex with those having co-morbid substance dependence or abuse (Green, 2005;Doyle et al., 2014;Abdel-Baki et al., 2012).

Limitations
The PA group underwent careful initial screening in the beginning, and there were some dropouts that did not fulfill the inclusion criteria (37 of 221 subjects). The same kind of initial screening was not possible for the NPA group because their information was collected retrospectively only from the patient records. In accordance, it was not possible to conduct structural interviews in the NPA group, and thus there might have been subjects who would not have been classified as FEP or highrisk psychosis according to structural interviews. Thus, the NPA FEP group may also include less severe disorders that may affect, to some degree, our results. It is also possible that participating in a structured clinical study may represent a positive intervention per se and influence the outcomes in the PA group. However, the study protocol was relatively light and not markedly different from the routine protocol for a new patient with psychosis in our clinic, making any major effect unlikely. We also acknowledge that the follow-up time was relatively short, i.e., only 1 year. Longer follow-up time might have been more adequate for indicating recurrent hospitalization, treatment adherence, and overall outcome in the longer term (Tiihonen et al., 2018).

Conclusion
In conclusion, our study showed only relatively modest differences in the clinical characteristics and early treatment parameters between the participating and nonparticipating groups. These results support the clinical representativeness of this FEP study. Also, treatment studies on FEP should make rigorous efforts to characterize all patients in the intent-to-study group for clinical validity. Moreover, studies on dualdiagnosis patients with psychosis are needed to address real-life treatment challenges in this group of patients.