The prognosis of schizophrenia: A systematic review and meta-analysis with meta-regression of 20-year follow-up studies

Objective: The aim was to examine the general outcome of schizophrenia after 20 years or more. Methods: Using the PRISMA guidelines, we conducted a systematic review and meta-analysis with meta-regression on long-term follow-up studies of schizophrenia up until April 21, 2021. We included prospective studies with at least 20 years of follow-up on patients with a diagnosis of schizophrenia, and the studies had to include face-to-face clinical evaluation. We examined outcome in three nested groups: ‘recovery ’ , ‘good or better ’ (including also ‘recovery ’ ), and ‘moderate or better ’ (including also ‘recovery ’ and ‘good or better ’ ). We used random-effects meta-analysis and meta-regression to examine mean estimates and possible moderators. Results: We identified 1089 records, which were screened by two independent researchers. 14 prospective studies (1991 patients) published between 1978 and 2020 were found eligible. The studies used a range of different scales and definitions for outcome, and some used the same definitions for different outcomes. To compare outcome across studies, we designed and applied a unified template for outcome definitions and cutoffs, based on earlier studies' recommendations. Our meta-analysis found that 24.2 % had ‘recovered ’ (n = 246, CI: 20.3 – 28.0 %), 35.5 % had a ‘good or better ’ outcome (n = 766, CI: 26.0 – 45.0%), and 59.7% had ‘moderate or better ’ outcome (n = 1139, CI: 49.3 – 70.1 %). Conclusions: The results contribute to debunk the myth that schizophrenia inevitably has a deteriorating course. Recovery is certainly possible. Schizophrenia remains, however, a severe and complex mental disorder, exhib-iting a limited change in prognosis despite > 100 years of research and efforts to improve treatment.


Introduction
Schizophrenia has traditionally been associated with concepts of progression, relapse, and chronicity. The source of this view is usually ascribed to Kraepelin, who described dementia praecox as a disease with a chronic course and poor prognosis (Kraepelin, 1899). Yet, he also described that remission and recovery from schizophrenia were possible, and that heterogeneity was a marker of both the clinical presentation and the outcome of the disorder (Kendler, 2021). This view has been corroborated by longitudinal studies on schizophrenia, which have found a proportion of patients to have good outcome or recovery (McGlashan and Carpenter, 1988). However, substantial heterogeneity both between studies and within study groups exists, allowing noticeable variation in assessment of both course and outcome of schizophrenia. Methodological issues such as sample characteristics, diagnostic methods, follow-up assessments, and outcome measures have been highlighted as sources of this heterogeneity (Davidson and McGlashan, 1997;Jobe and Harrow, 2005).
In the last decades, there has been a growing interest in recovery and remission in schizophrenia (Andreasen et al., 2005;Huxley et al., 2021;Jääskeläinen et al., 2013;Lally et al., 2017;Warner, 2009). This has sparked efforts to clarify what recovery means. For example, Jääskeläinen et al. (2013) define recovery as both clinical and social remission, requiring that either clinical or social remission has lasted for at least two years. This definition is largely consistent with earlier proposed definitions (see e.g., Bleuler, 1978;Gross and Huber, 1986). The Remission in Schizophrenia Working Group (RSWG) has made a standardized definition for clinical remission, i.e. an improvement in core symptoms to a low-mild intensity level, where they no longer interfere significantly with behavior (Andreasen et al., 2005). Warner describes how social environment have a profound effect on outcome of psychoses, and that social remission can be defined as economic and residential independence with low social disruption (Huxley et al., 2021). Jääskeläinen et al. (2013) conducted a meta-analysis on follow-up studies of schizophrenia of varying durations with details on recovery and found a mean recovery rate of 16.4 %, with no gender difference, and no change over time or between different follow-up periods, but also a higher recovery rate in low-and middle-income countries. Another meta-analysis by Hegarty et al. (1994) examined studies with a followup length of at least one year and found that 40.2 % had good outcome after an average of 5.6 years. While reviews and meta-analyses on outcome of schizophrenia have been published (Ajnakina et al., 2020(Ajnakina et al., , 2021Angst, 1988;Carpenter and Kirkpatrick, 1988;Huxley et al., 2021;Jobe and Harrow, 2005;Lang et al., 2013;McGlashan, 1988;Volavka and Vevera, 2018), they usually include studies of markedly different follow-up periods or, include schizophrenia in a broader group of psychotic disorders with better long-term outcome than schizophrenia (Peritogiannis et al., 2020). Based on the available literature, it is not possible to draw conclusions about the very long-term outcome of schizophrenia. So far, no systematic review or meta-analysis has focused exclusively on the very long-term outcome of schizophrenia.
The aim of our study was to shed light on this issue, i.e., to explore the very long-term outcome of schizophrenia. Consequently, we conducted the first systematic review and meta-analysis with metaregression of all prospective follow-up studies on schizophrenia, spanning at least 20 years.
Finally, reference lists from systematic reviews, book chapters, and articles were screened for additional articles.
The protocol was registered on PROSPERO (CRD42021252124). The following inclusion criteria were used for study eligibility: I. Peer reviewed prospective cohort studies, in English language.
Studies may contain retrospective parts of the study, but they may not be solely based on retrospective reports. II. Follow-up period of at least 20 years. III. Baseline and follow-up assessment must include face-to-face clinical evaluation and not be restricted to questionnaires or register-based data. IV. The study must provide information on the applied diagnostic method. Studies with diagnoses made retrospectively solely were excluded. V. Diagnosis of schizophrenia, schizoaffective disorder, schizophreniform disorder, or non-affective psychosis were included.
Studies without differentiation between types of psychosis, e.g., FEP cohorts, were excluded. VI. Clearly defined outcome measures. Follow-up data on general, psychopathological, or social function outcome. VII. Minimum follow-up sample size of 10 patients. VIII. Sample is not restricted to children.
Two of the authors (I.M. and R.H.) independently screened the titles and abstracts, after settling on inclusion-and exclusion criteria with authors J.N., A.U.P., and M.G.H. Studies were included for full-text assessment in cases of doubt. The retrieved articles' reference lists were also reviewed to further identify potentially relevant studies. Disagreement regarding inclusion was resolved through a meeting between all authors.

Data extraction
The following data was extracted: First-author, year of publication, study location, study design, setting (inpatient, outpatient, mixed), number of follow-up assessments, duration of follow-up, first-admission sample or mixed, sample size at baseline and last follow-up, gender, age at follow-up, diagnosis and diagnostic method, outcome measurements, mortality and lost to follow-up. If more than one diagnostic evaluation was made, this was reported in the table, and the earliest diagnosis was used in the analysis. We extracted clinical, social, and overall outcomes at the latest follow-up point (20+ years) as outcome data. The data were either extracted from articles or converted from tables and figures provided. Ambiguities were resolved by consensus among the authors.

Definition of outcomes
To compare outcomes across studies using heterogeneous and sometimes even inconsistent definitions or cutoffs, we constructed a unified template with standardized definitions for different outcomes, i. e., recovery, good, moderate, and poor outcome. Our definitions were as far as possible based both on definitions in the original studies and on recommendations from other studies (Andreasen et al., 2005;Bleuler, 1978;Gross and Huber, 1986;Harrison et al., 2001;Helgason, 1990;Huxley et al., 2021;Jääskeläinen et al., 2013;Kua et al., 2003;Newman et al., 2012;Ogawa et al., 1987;Sartorius et al., 1996;Warner, 2009) (see Table 1). We then superimposed these outcome definitions onto the reviewed studies. A similar approach was applied in the meta-analysis by Hegarty et al. (1994) and Jääskeläinen et al. (2013).
We defined recovery as both clinical and social remission, and one of these must have lasted for at least two years (Jääskeläinen et al., 2013). Good outcome is defined as the presence of clinical and/or social remission, which is in line with prior definitions by Bleuler (1978), Gross and Huber (1986) and Hegarty et al. (1994). We used the definition for clinical remission constructed by the RSWG, as "no or mild symptoms", or as a score of mild or less on the Positive and Negative syndrome scale (PANSS), the Scale for Assessment of Positive Symptoms (SAPS), the Scale for Assessment of Negative Symptoms (SANS) or the Brief Psychiatric Rating Scale (BPRS) (Andreasen et al., 2005). Social remission is defined as good or very good social function, which can be evaluated from a range of different outcome measurements; full employment, normal social life, self-supportive, scores of good or very good social function on WHO Disability Assessment Schedule (WHODAS), or the Social and Occupational Functioning Assessment Scale (SOFAS) (Helgason, 1990;Kua et al., 2003;Ogawa et al., 1987;Warner, 2009). The Global Assessment of Functioning Scales (GAS/GAF) were also included with a cut-off of >60 (Endicott et al., 1976;Harrison et al., 2001;Sartorius et al., 1996). Moderate outcome is defined as intermediate or moderate symptoms and/or moderately impaired social function, e.g., part time employment. Poor outcome is defined as severe or unstable symptoms and/or poor social function, e.g., living isolated or hospitalized (Bleuler, 1978;Gross and Huber, 1986;Helgason, 1990; Kua et al., Poor social, clinical or general outcome defined as one or more of the following: General outcome Both clinical and social remission a One of these for at least 2 years. (Jääskeläinen et al., 2013). E.g.: "Complete remission", monophasic or polyphasic illness course with ≥3 years complete recovery after psychotic phase, and full employment. (Gross and Huber, 1986). Recovered "end state" (≥5 years): Full employment, reassumed social roles. Currently no psychotic symptoms except for some eccentricity or symptom residues. ( Bleuler, 1978). Returned to previous social functioning, independent social life, maintains a normal family life and no positive symptoms for the last 5 years. (Ogawa et al., 1987).
GAF: Global Assessment of Functioning Symptoms/Disability (American Psychiatric Association, 1994). GAS: Global Assessment Scale Evaluation over 1 month (Endicott et al., 1976). WHODAS: WHO Disability Assessment Schedule (World Health Organisation (WHO), 1988). SOFAS: The Social and Occupational Functioning Assessment Scale, subscale of the Global Assessment of Functioning (Goldman et al., 1992). RSWG: Remission in Schizophrenia Working Group (Andreasen et al., 2005). PANSS: Positive and Negative Syndrome Scale (Kay et al., 1987). SAPS: Scale for Assessment of Positive Symptoms (Andreasen and Olsen, 1982). SANS: Scale for Assessment of Negative Symptoms (Andreasen, 1982). BPRS: The Brief Psychiatric Rating Scale (Andreasen et al., 2005;Overall and Gorham, 1962). a Criteria for clinical and social remission are the same as for good outcome, but recovery requires both clinical and social remission.
2003; Ogawa et al., 1987). Cut-offs on the scales GAF/GAS, WHODAS and SOFAS are visible in Table 1. Newman et al. (2012) evaluated "loss of productive time", which is also noted with cut-offs in Table 1. We have not included treatment requirements in the definitions, nor have we included a time criterion for good, moderate or poor outcome, because this is rarely stated. For more details see Table 1. All general, clinical, and social outcomes from the included studies were extracted as well as their use of definitions and cut offs. Outcomes were reported as means (%). Each outcome was evaluated in accordance to Table 1, and a decision by the authors in consensus was made whether to include, exclude or moderate the outcome as to meet our general standardization. The decisions are reported in Table 3. If a study reported more than one result for an outcome category, a mean was calculated for the meta-analysis.
In the meta-analysis, we examined the long-term outcome using three variables: 'recovery', 'good or better' (including also 'recovery'), and 'moderate or better' (including also 'recovery' and 'good and better'). Independent meta-analyses were performed on each of the three variables, where each variable was dichotomized as following; 'recovery' (yes or no); 'good or better' (yes or no); and 'moderate or better' (yes or no). The three variables are nested in the sense that patients in recovery also are included in the 'good or better' and 'moderate and better', and patients with 'good or better' outcome are also included in 1079 records identified from PubMed (n = 239), PsycInfo (n = 315), EMBASE (n = 511). Databases (n = 1065). Added from references (n = 14).
Conference abstract or commentary (n = 15). Ambiguity in methods or outcome not relevant (n = 11).

Identification of studies via databases and registers
6 % 17 % 23 % 60 % All outcomes except recovery included due to lack of time criterion. Helgason, 1990 Good :    'moderate or better'. Poor outcome is defined as the inverse of the variable 'moderate or better' (see Fig. 1 for details).

Quality assessment
Quality assessment of risk of bias of all selected studies was performed using the Newcastle-Ottawa Scale (NOS) for nonrandomized studies (Wells et al., 2008). They were rated according to three main aspects: selection, comparability, and exposure.

Meta-analysis
We performed a series of meta-analyses using proportional randomeffects models with inverse-variance weighting. Random-effects models were used because of the differences in diagnostic procedures, cohort recruitment, demographics, treatment availability and other aspects, which makes it reasonable to assume that there will be a high degree of heterogeneity between studies. Previous systematic reviews and metaanalyses of similar research questions have indeed shown a high degree of heterogeneity (Boonstra et al., 2012;Hegarty et al., 1994;Jääskeläinen et al., 2013). Heterogeneity was described as I 2 , which is a recommended transformation of the calculated Q. Values of I 2 range from 0 % to 100 %, reflecting the proportion of the total variation across studies beyond chance. The value of 25 % describes low, 50 % moderate, and 75 % high heterogeneity. As a sensitivity analysis, we performed an influence analysis for each of the three main analyses by running each analysis multiple times, each time excluding one of the studies. This allowed us to assess whether the results were robust against excluding particular studies (Borenstein, 2009).

Meta-regression-analysis
We also explored possible sources for heterogeneity both in methodology and by performing meta-regression analyses on possible moderators, which have been suggested or examined in earlier studies as contributors to heterogeneity in outcomes between studies, e.g., changes in outcome over time, or changes in outcome per diagnostic methodology (Harrison et al., 2001;Hegarty et al., 1994;Jääskeläinen et al., 2013;Menezes et al., 2006).
We used the same analysis design as in the main analyses, adding each of the variables as a moderator in the model. We thus performed 15 meta-regression models (five moderator variables multiplied with three outcome measures). In cases in which the meta-regression models showed statistically significant effects of the moderator variables at the p < 0.05 level, subgroup analyses were performed to obtain proportion estimates for presentation. Influence analyses were also performed to ascertain that the results were not contingent on a particular study. In cases with more than one moderator variable showing a statistically significant association with a particular outcome measure, all the statistically significant moderator variables were entered in another metaregression model to assess what associations remained substantial after this adjustment.
The analysis related to change over time was conducted by examining the studies according to the enrollment year, which was entered in the analyses as a continuous variable. The mean age at follow-up for the study sample was compared by creating two groups (age ≤ 50, age > 50), based on the pooled mean age being 52.17. The effect of the geographical placement of the study was examined by dividing the studies into the geographical groups of Europe, North America, or Asia. We also analyzed studies using preoperational diagnostic criteria (i.e., E. Bleuler's criteria, M. Bleuler's criteria, ICD-8, (International Classification of Diseases 8th edition), ICD-9, DSM-II (Diagnostic and statistical manual of mental disorders 2nd edition)) vs. studies using operational diagnostic criteria (i.e., ICD-10, DSM-III, DSM-IV).
All statistical analysis was performed using R 4.0.2 (R Core Team, 2013) and the metafor package (Viechtbauer, 2010).

Attrition
In order to address the possible effects of the high attrition rates in the included studies, we performed a series of meta-regressions for each of the outcomes using attrition rate as an independent variable in the models.

Study characteristics
14 studies published between 1978 and 2020 were eligible for inclusion (see Fig. 2 for flowchart; studies excluded after full-text assessment are presented in the Supplementary Material). Study characteristics of the included studies are presented in Table 2.
The studies included 4163 participants at baseline and 1991 participants at the latest follow-up point, leaving attrition at 52.17 %. The mean age of the 1991 participants (48.5 % female) was 52 years. Seven studies were conducted in Europe, three studies were conducted in Asia, and four were conducted in North America. All participants were diagnosed with schizophrenia. Four studies had schizophrenia as a subgroup compared to other psychoses, schizoaffective disorder, or a nonpsychotic group (Grossman et al., 2008;Kotov et al., 2017;Marneros  MA: Meta-Analysis. GAS: Global Assessment Scale. Evaluation over 1 month (Endicott et al., 1976). ESAS: Eguma's Social Adjustment Scale (Eguma, 1962 (Levenstein et al., 1966). SCS: Strauss-Carpenter Scale (Strauss and Carpenter, 1972). SOFAS: The Social and Occupational Functioning Assessment Scale is a subscale of the Global Assessment of Functioning (GAF). BPRS: The Brief Psychiatric Rating Scale (Overall and Gorham, 1962). a Our evaluation for inclusion in our meta-analysis: All outcomes were evaluated in respect to Table 1's definitions and cut-offs. Results that did not meet our definitions were adjusted to fit them. If this was not possible, the results were not included. If a study reported more than one result for an outcome category, a mean was calculated in the comparative analysis. b Social Function Class; means were calculated from Fig. 2  .

Fig. 3.
Forest plots for the nested outcome groups of 'recovery', 'good or better', and 'moderate or better'.
I.-M. Molstrom et al. et al., 1989;Opjordsmoen, 1986). Eight studies used pre-operational diagnostic criteria, and six studies used operational diagnostic criteria. The study by Harding et al. (1987) enrolled from a selected, chronic group of patients with schizophrenia receiving specialized rehabilitation. The Chicago follow-up study recruited participants from a private hospital (Grossman et al., 2008). Six studies included only first admission or first contact patients. Mortality rates ranged from 6 % (Cechnicki et al., 2020) to 75 % (Ciompi, 1980) (the latter focused on late phase schizophrenia and had an average age at follow-up of 75 years).

Outcome
Our assessment of the outcomes for each study is presented in Table 3, where the studies' own definitions and scales used to measure and define outcomes also can be found. Several studies used different outcome definitions and cutoffs, and sometimes the same definition was used for different outcomes, e.g., Cechnicki et al. (2018) used the same definition for recovery as Kua et al. (2003), Opjordsmoen (1991), and Harding et al. (1987) used for good outcome (i.e., GAS > 60).

Quality assessment
Risk of bias for each study and quality assessment are presented in Table 1 of the Supplementary material. Two studies were considered of good quality, receiving a rating of 7 (Grossman et al., 2008;Kotov et al., 2017), whereas the 12 remaining studies were considered of fair quality, receiving ratings of 5 or 6.

Meta-analysis
Four studies, comprising a total of 1027 study participants, included a comparable 'recovery' outcome measure, which was reported in 24.2 % (95 % confidence interval: 20.3-28.0 %) of the participants. 'Good or better' outcome was assessed in all 14 studies, including a total of 1991 study participants, and was reported in 35.5 % of the participants (95 % CI: 26.0-45.0 %). 'Moderate or better' outcome was assessed in 11 studies, comprising 1805 study participants, and was reported in 59.7 % of the participants (95 % CI: 49.3-70.1 %). Inversely, this also means that 40.3 % of the participants had poor outcome. The results of the three main analyses are summarized in forest plots (Fig. 3). As expected, we found the estimates highly heterogeneous, except for the recovery group which may be due the fact that only four studies were included ('recovery': I 2 = 12.6 %, 'good or better': I 2 = 98.0 %, and 'moderate or better': I 2 = 95.3 %). Data included in the meta-analysis is provided in Supplementary Table 4.

Meta-regression-analysis
The moderator variables 'enrollment year' and 'age group' did not show any statistically significant associations with any of the outcome variables. The moderator variable 'geographical location' did not show statistically significant associations with the outcome variables 'recovery' or 'good or better' ( Table 4). The moderator variable 'diagnostic method' did not show a significant association with 'recovery'. Yet, patients in studies using operational diagnostic criteria were significantly less likely to report 'good or better' outcome compared to studies using pre-operational diagnostic criteria (23.7 % vs. 44.0 %, mean difference 20.3 percentage points, 95 % CI: 4.2-36.5, p = 0.014). This was also the case for 'moderate or better' outcome (43.0 % (operational diagnostic criteria) vs. 69.3 % (pre-operational criteria), mean difference 26.3 percentage points, 95 % CI: 11.8-41.2, p < 0.001) ( Table 4).
Participants in North America studies were less likely to have 'moderate or better' outcome than participants in Europe studies (36.4 % vs. 63.4 % for European studies, mean difference: 27.0 percentage points, 95 % CI: 4.4-49.7, p = 0.019). In an influence analysis, the mean estimate changed with a maximum of 10.5 percentage points. In another sensitivity analysis, a meta-regression model on 'moderate or better' outcome was performed, including both diagnostic and geographic groupings. The results were like the results from the single-moderator variable models, indicating independent effects of diagnostic and geographic groupings. In an influence analysis of this meta-regression model, there were no notable deviations, further lending support to the robustness of the influence of the moderator variables diagnostic and geographical groupings on the 'moderate or better' outcome metaregression.

Table 4
Meta-regression analysis for the nested outcome groups of 'recovery', 'good or better', and 'moderate or better'.

Discussion
This is the first systematic review and meta-analysis of all prospective follow-up studies on schizophrenia, spanning 20 years or more. We found that 24.2 % of patients with schizophrenia had 'recovered'; 35.5 % had a 'good or better' outcome (which also includes 'recovered'); and 59.7 % had 'moderate or better' outcome (which also includes 'good or better ' and 'recovery'). This means that 40.3 % had poor outcome. Though the attrition analysis indicates that the outcomes might be worse than what these figures suggest, we nevertheless conclude that a deteriorating course of illness, poor outcome, and inability to recover are not defining features of schizophrenia. While our results show a mix of outcomes, it is noteworthy that outcome of schizophrenia generally is worse than that of other psychotic disorders, e.g., psychotic mood disorders (Harrison et al., 2001;Kotov et al., 2017), substance-induced psychosis (Harrison et al., 2001;Kotov et al., 2017), or other psychosis (Harrison et al., 2001;Harrow and Jobe, 2010;Kotov et al., 2017;Marneros et al., 1989;Opjordsmoen, 1991).
The heterogeneity in outcomes of schizophrenia between studies and within samples is echoed in the history of follow-up studies (Carpenter and Kirkpatrick, 1988;Häfner, 2014;Hegarty et al., 1994;Heilbronner et al., 2016;Jobe and Harrow, 2005). Our results attest to the overarching heterogeneity in 20-year outcomes of schizophrenia, with a substantial range in all outcome categories. We also found considerable methodological heterogeneity in the studies' use of scales, definitions, and cutoffs for outcome. Although some efforts have been made to make outcome definitions, measurements, and diagnostic methods more homogenous, the continuously found heterogeneity in outcomes of schizophrenia between studies inevitably raises the question to what extend it may be a product of such methodological differences. To address the issue of heterogeneity, we imposed our unified template of outcome definitions onto the studies' results, making the meta-analysis with meta-regression conceptually and methodologically stronger.
Our results on 'good or better' outcome (35.5 %) are comparable to the results from the meta-analysis by Hegarty et al. (1994), who found a mean favorable outcome for 40.2 %. We found a higher proportion of recovered patients (24.2 %) than what was found by Jääskeläinen et al. (2013) in their meta-analysis on recovery in schizophrenia (16.4 %). These two studies included long-, medium-, and short-term follow-up studies. This may indicate that after 20 years or more, patients with schizophrenia are similarly or slightly better off compared to patients with schizophrenia at earlier follow-up stages. In this context, it is relevant to mention that several studies have suggested a possible stabilization of the disorder into mild/moderate/severe outcome after 5-10 years (Davidson and McGlashan, 1997;Jobe and Harrow, 2005;Ogawa et al., 1987;Wiersma et al., 2000). Ogawa et al. (1987) describe it as a scissor-phenomenon, where schizophrenia, after a tumultuous initial phase, stabilizes in either a favorable or less favorable illness trajectory. However, several studies highlight that in this stabilization phase, significant fluctuations may still occur, e.g., late-stage recovery, which suggests that there might still be room for optimism despite years of less favorable outcome (Harrison et al., 2001;Thara, 2004).
Another study that deserves mentioning in this context is The World Health Organization coordinated International Study of Schizophrenia (ISoS) study, which was excluded from our analysis as it did not meet the 20-year follow-up criterion. The ISoS study was created in an effort to reduce heterogeneity and enable cross-cultural comparison of course and long-term outcome of schizophrenia over 15-25 years (Harrison et al., 2001;Sartorius et al., 1996). They found favorable outcome for over half of the people with schizophrenia, and 'recovery' for 37.8 % using M. Bleuler's rating of recovery and a GAF-disability score >60. Nonetheless, the authors also found striking heterogeneity between the participating centers, indicating that some of the differences in outcome lie beyond methodological differences and most likely pertain to the disorder of schizophrenia itself. A substantial contributor to heterogeneity in the ISoS study is what they called the 'developing country effect', referring to the frequent finding of better outcome in low-and middle-income countries (Harrison et al., 2001;Menezes et al., 2006;Peritogiannis et al., 2020;Warner, 2009). Since our review only included one study from a low/middle-income country (Thara, 2004), we cannot add anything to this observation.
To further examine the heterogeneity of outcomes among the studies included in our study, we explored several possible explanations but found little that could explain it. For example, the heterogeneity could be due to change in outcome over time, but our results suggest only little change in outcome from the first long-term study enrolled patients in the early 1900'ies (Ciompi, 1980) to the latest study that was initiated in 1990 . This finding is somewhat consistent with that of prior meta-analyses, which found an increase in favorable outcome around and after 1950, followed by a decrease in the latest decades   (Hegarty et al., 1994;Jääskeläinen et al., 2013). Most likely, the increase in favorable outcome is related to the appearance of the antipsychotic drug Chlorpromazine in the early 1950's (Ban, 2007). However, Hegarty et al. (1994) suggest that while the initial increase in good outcome may reflect the improved treatment and a broadened concept of schizophrenia, the later decline in good outcome to levels like that in 1895-1955, might reflect the re-emergence of a narrower diagnostic concept.
Interestingly, we found that studies using operational diagnostic methodology (i.e., diagnostic systems from DSM-III and subsequent versions, and ICD-10) were significantly less likely to report 'good or better' (23.7 % vs. 44.0 %) and 'moderate or better' (43.0 % vs. 69.3 %) outcome than studies using a pre-operational diagnostic methodology. As suggested by Hegarty et al. (1994), our finding could be a reflection of the changing nosologically boundaries of the schizophrenia diagnosis over time. The operational definitions of schizophrenia put a stronger emphasis on the severity of psychotic symptoms (e.g., by excluding milder forms of formal thought disorders). In doing so, they favor a more chronic, delusional-hallucinatory syndrome, impeding a substantial number of hebephrenic and other non-paranoid patients, which would have received a schizophrenia diagnosis based on pre-operational definitions, to be diagnosed with schizophrenia (Jansson and Parnas, 2007;Parnas and Kendler, 2012).
Finally, we also found that studies conducted in North American were less likely to report 'moderate or better' outcome than studies conducted in European (36.4 % vs. 63.4 %), and, moreover, that the effects of diagnostic methodological and geographical location were independent of each other. However, since only two studies out of the four conducted in North America, could be included in this analysis, caution is warranted when interpreting the effect of this geographical location.
Despite major changes in treatment and support to people with schizophrenia over the last century, e.g., deinstitutionalization, antipsychotic medications, psychosocial interventions, the proportion of patients with a favorable outcome appears fairly constant. However, as we focused on studies with at least 20 years of follow-up, studies on early detecting and intervention programs, which were primarily implemented in the last two decades, were not included.

Strength and limitations
Our study is strengthened by its inclusion of all prospective follow-up studies of schizophrenia of at least 20 years, and by addressing the studies' various methodological issues (outcome definitions and diagnostic criteria) and geographical location. Another strength is the present study's exclusion of studies, which grouped schizophrenia and other psychotic disorders in the same category, thereby risking overestimation of good outcome.
A limitation to this study is the heterogeneity of the reviewed studies, e.g., in terms of sample type (ranging from first-episode schizophrenia to chronic treatment refractory schizophrenia), diagnostic methods, and outcome definitions. Due to a general lack of detail in the included studies, our review could not assess associations between treatment and outcome. This is an important limitation, since treatment likely affects outcome. Exploring the effect of treatment on outcome is an obvious target of future studies on long-term outcome of schizophrenia as well as availability of different treatment options, including treatment targeted social remission and occupational opportunities. Another limitation is the lack of information on the course of schizophrenia, which also could have been relevant to consider. Regarding our meta-analysis and metaregression analysis, several limitations must be mentioned: Selection bias at baseline, and differences in the type of patients enrolled depending on enrollment year and region. In the meta-regression analyses, there is a general lack of power due to small sample size. The metaanalysis of recovery could only include four studies, which calls for future follow-up studies on recovery, including both clinical and social domains as well as a defined time criterion. Finally, exploring results from newer long-term studies of schizophrenia, based on the early intervention programs that were initiated in the 2000's (once these studies are published), will be highly relevant to calibrate the outcome of schizophrenia in the 21st century.
With 20+ years of follow-up there will almost inevitably be some lost to follow-up. Especially Ciompi (1980) had high attrition due to the nature of the study examining late phase schizophrenia. Our metaregression-analysis found a statistically significant association between attrition and 'good or better' outcome, indicating a positive bias in high attrition, and that those lost may have had a worse outcome than those included at follow-up. Our sensitivity analyses on attrition presents a 'worst-case-scenario' as it were, where those lost to follow-up were computed as having the less favorable outcome (which is not necessarily the case). This attrition analysis found recovery for 13.1 %, 'good or better' outcome for 22.4 % and 'moderate or better' outcome for 37.9 %. The true values are probably somewhere in between these results and those from the main analysis.
In conclusion, schizophrenia is a disorder with heterogeneous outcome. Recovery occurred in 24.2 % of patients with schizophrenia. 'Good or better' outcome (including also recovery) occurred in 35.5 % of patient, whereas poor outcome occurred in 40.3 % of patients. Though our attrition analyses showed that these results might overestimate favorable outcome a bit, it does not invalidate the conclusion that a nonnegligible proportion of patients with schizophrenia have a more favorable outcome than what is often assumed. Still, patients with schizophrenia generally have a worse prognosis than patients with other psychotic disorders and the results from our study emphasize the need for continuous supportive care and treatment of this complex mental disorder.

Data sharing
All data included were derived from publicly available documents cited in the references. Extracted data are available upon request to the corresponding author.

Role of funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Funding
The Mental Health Services of the Capital Region of Denmark.

CRediT authorship contribution statement
I.M., J.N., R.H., A.U.P and M.G.H. designed the search strategy and selection criteria. I.M. conducted the search and removed duplicates and publications that were not peer-reviewed journal articles and not written in English. I.M. and R.H. made a full-text assessment of the remaining articles for eligibility and reviewed them independently. I.M. constructed the tables and figs. I.M., M.G.H. and J.N. wrote the first draft, which was revised by A.U.P., R.H., and J.B. J.B. conducted the meta-analysis and meta-regression, and J.B. wrote the drafts on the statistical methods and the results from the meta-analysis. All authors contributed to the revision of subsequent drafts and all authors approved final draft.

Declarations of competing interest
All authors report no conflicts of interest.