The impact of study design on schizophrenia incidence estimates: A systematic review of Northern European studies 2008–2019

The best estimates of the incidence of schizophrenia range more than 25-fold from 3 to 80 per 100,000 person-years. To what extent do differences in study design explain this wide variation? We selected all studies published between 2008-2019 reporting the incidence of schizophrenia in general populations of Northern Europe. We identified 17 estimates covering 85 million person-years and more than 15,000 individual cases. The estimates ranged from 4-72 per 100,000 person-years (median 30; interquartile range 13-41). We classified the estimates in terms of three study design factors (coverage of services, time frame, and diagnostic quality) and two population factors (urbanicity and age). A meta-regression model of the three design factors, using the two population factors as covariates, explained 91% of between-study variation. Studies performed in general psychiatric services reported similar estimates [incidence rate ratio 1.12 (95% confidence interval 0.88 to 1.43)] to those performed in specialized services. But studies applying a cumulative time frame to diagnosis reported fourfold higher estimates [4.04 (3.14 to 5.2)] than those applying a first-contact time frame. And studies based on clinical diagnoses reported lower estimates [0.55 (0.43 to 0.72)] than those based on standardized research diagnoses. The three study design factors by themselves explained 67% of between-study variation. When comparing incidence rates from different populations, distorsions arising from differences in study design can eclipse differences caused by schizophrenia risk factors, such as gender, age or migrant status.


Rationale
Systematic reviews report a wide variation between estimates of the incidence of schizophrenia. Two international reviews together cover the period 1950-2017: one review of schizophrenia incidence studies published between 1950-2000 reported estimates ranging from 4-52 per 100,000 person-years (van der Werf et al., 2014), while the other review of psychosis incidence studies published between 2002-2017 reported schizophrenia incidence estimates ranging from 3-76 per 100,000 person-years . Variation between countries with different cultures and health care systems can be expected, but reviews of incidence from similar countries also show wide variations: a review of UK studies published in 1950-2009 reported estimates ranging from 4-32 per 100,000 person-years (Kirkbride et al., 2012); while another review of studies published between 1992-2012 with estimates from the Netherlands, Sweden and Denmark ranged from 9-80 per 100,000 person-years (Vassos et al., 2012).
One explanation for the wide variation is that different rates result from different population characteristics, i.e. with different distributions of risk factors for schizophrenia. Populations with higher numbers of young adults or higher numbers of males, for example, are likely to report higher incidences than studies focusing on the population at large (Thorup et al., 2007;. Similarly, studies in larger cities commonly report higher incidences than studies from rural areas (Vassos et al., 2016), and rates tend to be higher among immigrants than native inhabitants in an area (Selten et al., 2020, Bourque et al., 2011. Another explanation for the variation could be that different rates result from different study designs. In a previous study, we used two different study designs to estimate the incidence of schizophrenia in the city of The Hague in the Netherlands (Hogerzeil et al., 2014). The first approach we used was a standard first-contact design, which is generally considered the standard for incidence studies of schizophrenia. The second approach was based on a longitudinal case-register extracted from digital hospital records. In the database, we could follow patients beyond their first contact to detect diagnostic changes over the course of treatment. This longitudinal case-register approach resulted in an estimate that was more than three times higher than the estimate based on the first contact approach (69 (95% confidence interval (CI) 64 to 74) vs. versus 21 (18 to 23) per 100,000 person-years.). The impact of single aspects of study design was explored in several world-wide meta-analyses that included studies from heterogeneous populations (McGrath et al., 2004;van der Werf et al., 2014;Bourque et al., 2011). These analyses uncovered no clear patterns. Two recent meta-analyses Castillejos et al., 2019) examined this issue using meta-regression. Castillejos et al. (2019) reviewed only the literature based on first-contact sampling and reported that methodological differences helped to explain between-study heterogeneity.  compared case registers with first-contact studies and reported that register-based estimates are systematically higher [with a multivariable model relative risk of 2.51 (95% CI 1.24 to 5.21)]. However, neither review was set up to quantify the relative importance of different factors in study design.

Objectives
We have previously proposed to categorize the design of incidence studies on three factors: coverage of services, time frame of the diagnosis, and reliability of the diagnosis (Hogerzeil and van Hemert, 2019).
Our aim in this review was to examine to what extent reported incidence estimates are related to these three design factors, and so to distinguish artefacts from 'true' variation due to population characteristics. We hypothesized that estimates would be higher in studies with a wider service cover, longer time frames, and clinically oriented diagnoses.
To test this, we systematically identified all studies on the incidence of schizophrenia published from 2008-2019. We used meta-regression analysis to examine the impact of design features on the incidence estimates, adjusting for the impact of population characteristics.

Material and methods
This meta-analysis and meta-regression followed PRISMA guidelines (Liberati et al., 2009).
We based our study on the recent meta-analysis by , which covered all the original research on the incidence of non-organic, adult-onset psychotic disorder published in 2002-2017. Her method in turn was based on a previous systematic review by (Kirkbride et al., 2012), which covered the research conducted in England on the incidence of non-organic adult-onset psychosis, published in 1950psychosis, published in -2009psychosis, published in . Jongsma et al.'s (2019 search was very thorough, and had no restrictions on language of publication, study design, or publication status. It also searched for grey literature via published conference proceedings, author correspondence, and bibliographical searches.

Information sources
We included all studies included in Jongsma et al.'s (2019) metaanalysis and all citations listed in the supplemental data provided with the study. To cover studies published after Jongsma et al.'s (2019) review, we performed a systematic search for additional studies published up to December 31st 2019.

Search
We used the same search string used by , which she adapted from (Kirkbride et al., 2012), to query PubMed for studies published between January 1st 2018 and December 31st 2019 (see Appendix section). We performed bibliographic searches whenever possible. We had no language restrictions. We did not query other databases. We did not search the grey literature.

Eligibility criteria
We did not examine studies published before 2008 because one category of interest (applying a cumulative time frame) relies on types of clinical diagnostic practice and electronic data warehouses that only started to emerge at that time. We limited our selection to Northern European studies to reduce potential heterogeneity in health care systems. We considered only incidence estimates for schizophrenia to reduce potential heterogeneity in diagnostic practices.
Therefore, citations were eligible if they contained incidence data or data from which incidence could be derived (numerator and denominator); included patients (aged 18-64 years) diagnosed with a first episode of schizophrenia; covered populations in Northern Europe; were published between 2008 and 2019, and were listed either in Jongsma et al.'s (2019Jongsma et al.'s ( ) meta-analysis (if published 2008Jongsma et al.'s ( -2017 or in PubMed (if published 2018-2019).

Study selection and data collection process
We (AH and SH) first selected on title. We included studies if their title mentioned: (a) 'incidence', 'rate' or 'risk', and (b) one of the words 'schizophrenia', 'psychosis' or 'mental disorders'. We excluded studies with titles referring to specific subgroups as indicated by one of the diagnostic specifiers 'affective', 'postpartum', 'drugs or substance induced psychosis', or subpopulation specifiers 'in or among' 'migrants', 'youth', 'veterans', 'military', 'type 1 diabetes', 'adoptees', 'epilepsy' or 'immune-mediated inflammatory disease'.
We (SH) then selected on the full text. We included studies if they reported estimates of the incidence of 'narrow schizophrenia', defined as 'DSM-IV 295.x' or 'ICD-10 code F20 (including F21 and F25 if possible)' in the general population. We excluded non-European and South European studies to reduce heterogeneity from different healthcare systems and cultural effects on seeking healthcare.
If two or more studies reported on the incidence of schizophrenia in the same population, we included only one. To decide which one, we (SH and AH) assigned priority according to study period (more recent, larger) and quality (more detailed information, state-of-the-art procedures) to arrive at consensus. If two or more methods had been used in the same population, we included one estimate for each method.

Data items
For each study and (if necessary) for each type of study design applied in that study, we collected data related to publication, study period, study population (i.e. country, area, urbanicity, sex and age), study design (ie. coverage of services, time frame of diagnosis, reliability of diagnosis), and the incidence estimate (ie. cases and person-years at risk).
Coverage of services could range from: (1) 'specialized services' such as Early Psychosis Intervention (EPI) services, and emergency or in-patient services, to the broader set of (2) 'general'psychiatric or addiction services, and further to (3) primary or somatic medical care, and ultimately to (4) the general population. The time frame of diagnosis is the interval between the first contact with a service and the moment a diagnosis is made. It could range from: (1) case ascertainment at first contact, to (2) later stages of treatment, e.g. subjects presenting initially with another diagnosis, ultimately extending to (3) life-time follow-up. Finally, the reliability of diagnosis could range from diagnosis based on: (1) research diagnostic procedures, to (2) clinical criteria diagnoses (e.g. DSM-5 or ICD-10) and (3) non-standardized diagnostic procedures.
Age was categorized according to (Howard, 2000) in 'early onset' (age < 40 years), 'late onset (age 40-59 years) and 'late onset' (age > 60 years). Urbanicity was classified in three categories: urban, rural, and mixed (ie. for entire population estimates, such as studies from Denmark). We used the level of urbanicity that each study had assigned to itself.
2.6. Assessment of study quality Kirkbride et al. (2012) and  used a 7-point quality score. That score was not applicable to our review on 3 out of the 7 points because they relate to the first-contact design in particular ('standardized research diagnosis' and 'leakage study') or to studies of risk factors such as ethnicity ('blinding to demographic variables'). For inclusion in our meta-analyses, we required that all studies meet at least all four remaining criteria ('defined catchment area', 'accurate denominator', 'population based case-finding', and 'inclusion criteria'). We nevertheless scored studies on all 7-points for consistency with Kirkbride et al. (2012) and . For our purposes we considered any study meeting the four core criteria listed above as 'high quality'.

Summary measures
The principal summary measure was the treated incidence rate of schizophrenia per 100′000 person-years in the general population.

Synthesis of results
All incidence rates are expressed as number of cases per 100,000 person-years. We calculated exact confidence intervals for Poisson rates using the pois.exact() function from the 'epitools' package (Aragon et al., 2017) in R version 3.6.1 (R Core Team, 2020).
We calculated pooled incidence rates for each category of study population (ie. age, urbanicity) and study design (ie. coverage, time-frame, reliability).
We calculated the proportion of between-study variance explained by the covariates by comparing the estimated between-study variance τ 2 , with its value when no covariates are fit τ 2 0 . Adjusted R 2 is the relative reduction in the between study variance R 2 = τ 2 0 -τ 2 (Harbord and Higgins, 2008).

Additional analyses + meta-regression + sensitivity analysis
To examine how our three design factors related to the incidence, adjusting for differences in population characteristics, we first calculated unadjusted pooled incidence ratios for each of the three variables of interest (coverage, time-frame and reliability), and the two covariates (urbanicity and age). Next, to adjust for interdependencies between variables, we conducted a multivariable meta-regression analysis to estimate incidence ratios for each factor in a single model. To allow for variation both within and between studies, we used a mixed-effects model with restricted maximum likelihood (REML) estimators. We used the Knapp-Hartung adjustment to obtain more reliable confidence intervals (Knapp and Hartung, 2003) and permutation tests to assess the robustness of our model (Higgins and Thompson, 2004). The regression was performed using the 'meta' (Balduzzi et al., 2019) and 'metaphor' packages (Viechtbauer, 2010) in R.
To rule out bias from including estimates from our own research group (i.e. tilting the scale towards results that confirm our prior findings) we repeated the meta-regression analyses without our own data.

Study selection
The results of the study selection are summarized in a flowchart Based on title, we included 70/527 publications from our Pubmed search (left-hand column in the flowchart) and 68/125 publications from 's study (right-hand column) that explicitly mentioned: (a) 'incidence', 'rate' or 'risk', and (b) one of the words 'schizophrenia', 'psychosis' or 'mental disorders'. We then excluded 50/70 and 5/68 studies because the titles included the words 'review' or 'meta-analysis', resulting in respectively 20 and 63 studies. We then excluded 14/20 and 14/63 studies, with titles referring to specific subgroups as indicated by one of the diagnostic specifiers or subpopulation specifiers, resulting in respectively 6 and 49 remaining studies.
Based on the full text, we excluded 4/6 and 9/49 studies from non-European or South-European populations resulting in respectively 2 and 40 remaining studies. We then excluded 1/2 and 18/40 studies because they did not report estimates of the incidence of 'narrow schizophrenia'. Finally, we excluded 1/1 and 10/22 studies for miscellaneous reasons: two studies that did not describe a general population, one study where coverage and time frame could not be assessed, one study that was a conference abstract, one study with a small population sample (n < 4000), and 6 studies that reported duplicate or overlapping findings.

Study characteristics
All 12 studies were population-based, had specific inclusion criteria, and had an accurate denominator for a defined catchment area, ie. had a quality score of 4 or higher in terms of Kirkbride et al.'s (2012) 7-point score and were considered 'high quality' for our purposes. Our scores diverged from those by  for three studies (Boonstra et al., 2008;Reay et al., 2010;and Bhavsar et al., 2014) because we classified them as population-based, and as having an accurate denominator. In our sample, seven studies scored 4/7 points, four scored 5/7 points (Reay et al., 2010;Bhavsar et al., 2014;Szoke et al., 2016;and Kirkbride et al., 2017), and one (Kirkbride et al., 2014) scored 6/7 points. The quality factor 'research diagnosis'-by definition-was always present in our category 'using research diagnosis' and vice-versa. Otherwise, there was no association between study quality and study design, or between study quality and estimate size.

Estimate characteristics
In total, study selection and data extraction resulted in 17 estimates of the treated incidence of 'narrow schizophrenia' in the general population, for a variety of study designs (ie. coverage in two levels, time-frame in two levels, and reliability in two levels) applied to a variety of study populations (ie. age in two levels, urbanicity in three levels), adding up to 85 million person-years at risk.
This sample contained no estimates in primary care, somatic care, or in the general population. We dropped gender as category for analysis because this information was typically not provided. Information on population characteristics was available for two categories only: age range and urbanicity. There was insufficient information to separate 'early onset' from 'late onset' (40-59 years) and no data were available for 'very late onset' (> 60 years). We therefore merged the age categories into 'early onset' (age < 40 years) and 'early to late onset' (< 60 years). Urbanicity could be assessed for all studies. Three studies (Hogerzeil et al., 2014;Salokangas et al., 2011;Jörgensen et al., 2010) reported estimates based on more than one design or subpopulation and therefore contributed more than one estimate to our dataset. The pooled estimate was 40.2 per 100,000 person-years (95% confidence interval 39.5 to 40.8) for early onsets (< 40 years) and 23.1 per 100,000 person-years (95% confidence interval 22.8 to 23.3) for earlyto-late onsets (< 60 years).
Between study heterogeneity (I 2 ) in the study sample was 98.7% and 99.9% for early and early-to-late onsets, respectively.

Meta-regression
Unadjusted pooled incidences and incidence ratios for individual factors in study design or study population are shown in Table 2. In this single variable comparison, no significant differences were found for coverage of services, quality of diagnosis, or age of onset. For 'time frame for diagnosis', the incidence estimates were more than threefold higher for cumulative time frames versus first-contact studies (incidence ratio 3.21; 95% CI 3.13 to 3.30). In addition, estimates from rural populations were roughly six-fold lower than in urban populations (0.12; 95% CI 0.10-0.15).
Results of our multivariable meta-regression analysis are presented in Table 3. The meta-regression indicated that among adults aged 15-59 years in a general urban population, a study using research diagnoses made in specialized services and applying a first-contact time frame would estimate the incidence of schizophrenia at 25 per 100,000 person-years (Knapp Hartung adjusted 95% CI 95% confidence interval 15 to 40). But in the same population-a study using clinical diagnoses would report a 0.55 (0.38 to 0.81) times lower estimate, and one applying a cumulative time-frame would report a 4.04 (2.78 to 5.87) times higher estimate. If the same design were used in mixed and rural settings, estimates would be 0.54 (0.39 to 0.75) and 0.33 (0.18 to 0.6) times lower, respectively. If age of onset were to be restricted to early age of onset, the estimate would be 1.34 (1.02 to 1.75) times higher. Extending coverage to general psychiatric services would not increase estimates significantly (1.12 times; 95% CI 0.88 to 1.43).
The three study design factors together explained 67% of betweenstudy variance (adjusted R 2 ). A complete model, including the two differences in study population explained 91% of between-study variance.
Permutation tests confirmed that the estimators were robust. Running the meta-regression on subsets (i.e. the set of estimates reporting 'early onset' and the set of estimates for 'early-to-late onset' separately) did not change the outcome. Likewise, removing our own data (Hogerzeil et al., 2014) did not change the outcome.

Discussion
We conducted a review of 12 selected studies on the incidence of narrow schizophrenia in the general adult population published between January 1st 2008 and December 31st 2019. We examined the  3.9 to 6.1 0.12 0.10 to 0.15 n number of estimates, obs number of observed cases, ir incidence rate per 100,000 person-years, irr incidence rate ratio, CI exact Poisson confidence interval.
impact of differences in study design on the variation of reported incidences. We found 17 estimates in six countries, covering more than 15,000 individual cases and 85 million person-years. We examined the impact of three study design characteristics (coverage, time frame, reliability of diagnosis), adjusting for population characteristics with two covariates (age, urbanicity). Differences in study design together explained 67% of between-study variation, while a more complete model, including age and urbanicity as covariates, explained 91%. In our model, a longer 'time frame' resulted in four-fold higher estimates, and clinical diagnoses, compared to standardized research diagnoses, reduced estimates by half.
The four-fold difference between estimates based on cumulative vs first-contact time frames is in line with our previous study, where we compared a cumulative case-register design to a first-contact design in a single population (in the Netherlands), which demonstrated a 3.3fold higher estimate for the cumulative time frame (Hogerzeil et al., 2014). Similarly, other case-register studies have tended to report higher incidence estimates than first-contact studies (McGrath et al., 2004;Thorup et al., 2007;Pedersen et al., 2014;Anderson et al., 2019;Kirkbride et al., 2014). The findings in this study agree with our previous findings (Hogerzeil et al., 2014) and confirm them independently since our conclusions did not change when we removed our own data from the analysis. They confirm the threefold difference between register studies and first contact studies reported in 's meta-analysis. They expand on her finding by untangling the relative contributions of separate design aspects.
One limitation of the first-contact approach as commonly practiced is that it cannot account for long delays in reaching an ultimate diagnosis of schizophrenia. Most patients with schizophrenia first report to services with other symptoms, such as depression, anxiety or substance abuse (Rietdijk et al., 2011;Hogerzeil et al., 2014;Simon et al., 2017). They may also present with psychotic symptoms, but not per se schizophrenia. In our prior study (Hogerzeil et al., 2014), the median interval between first contact and the index diagnosis of schizophrenia was 4.9 years (interquartile range 1.1 to 8.8), but the interval sometimes extended beyond 25 years. In theory, Jablensky et al. (1992)'s original first-contact inclusion criteria do not exclude patients who first contacted services for other reasons. But in practice, most first-contact studies have not actively screened for onsets of schizophrenia among patients contacting services for other reasons, or patients currently under treatment for other reasons than psychosis.
A criticism on our approach could be that we focus our review on narrowly-defined schizophrenia. Many first contact studies nowadays are performed in Early Intervention services, as close as possible to the emergence of psychotic symptoms. Such services tend to work with provisional clinical diagnoses such as 'psychosis NOS' (not reviewed here), of which many are perhaps ultimately diagnosed with schizophrenia at later stages of treatment. So they treat more (future) cases of schizophrenia than is reflected in their provisional numbers. Our focus on narrow-schizophrenia therefore favours case registers compared to first-contact studies because registers work with ultimate rather than provisional diagnoses. Although the criticism can be a valid explanation for lower incidence estimates in first contact studies, it also underscores the potential under detection of true cases of narrow-schizophrenia in such designs.
Prior work suggests that both the primary care system and general psychiatric services play an important role in first diagnosis of psychotic disorder, and these physicians may be involved in ongoing psychiatric care, especially in settings where specialized services are unavailable (Simon et al., 2017;Anderson et al., 2019;Rietdijk et al., 2011). We had no data on the incidence of schizophrenia in primary care, somatic medical care or the general population. But contrary to our expectation, we found no differences between specialized vs general psychiatric services as channels for case-detection. This has implications for healthcare: in that increasing service coverage (beyond services typically used by psychotic patients) to detect more cases of incident schizophrenia will not result in better estimates, if the time frame remains limited to diagnosis at first-contact. One explanation could be that every subject with clinically relevant schizophrenia is eventually referred to specialized services (Weiser et al., 2012), and can be counted at that later point in time if the study design allows for such a pathway to care.
The two-fold difference between estimates based on research diagnoses vs. clinical diagnoses was also unexpected. It runs counter to common intuition that clinicians diagnose schizophrenia too easily and that relying on (presumably) conservative, standardized research procedures would result in lower (but more valid) incidence estimates Castillejos et al., 2019). The idea that research diagnoses are to be preferred over clinical diagnoses is contradicted by reports that clinical diagnoses can be valid (Uggerby et al., 2013;Ludvigsson et al., 2011;Dalman et al., 2002;Ekholm et al., 2005) and stable over time (Fusar-Poli et al., 2016). Because our sample contained no ePCR studies with research diagnoses, comparisons between studies based on clinical vs. research diagnoses were restricted to first-contact studies. The counterintuitive finding therefore bears primarily on firstcontact studies. It offers a new perspective, by suggesting that clinicians may in fact be more conservative than researchers in diagnosing schizophrenia. We speculate that clinicians are reluctant to diagnose schizophrenia formally to avoid the stigma associated with the label.

Limitations
The large attrition of eligible studies was a consequence of the quality criteria adopted to answer our research question. We restricted our search to studies published from 2008 onwards because clinical practices have become more standardized and electronic patients records Table 3 Multivariate meta-regression analysis, modeling incidence rate ratios' estimates in terms of coverage of services, time frame for diagnosis, and quality of diagnosis, with urbanicity and age as covariates. better available in recent years. The further restriction to studies from Northern Europe resulted in a high-quality study sample that was comparable in terms of culture and health systems. Despite the small number of studies, the sample still covered 85 million person-years and more than 15,000 cases of schizophrenia. The risk of bias is lower for incidence studies than for RCTs. They are not blinded or randomized. There are no financial or ideological incentives to distort the incidence estimate, or the association between study method and incidence. The quality scores of the studies included in our review were high and not related to estimate size. Our update for the years 2018-2019 did not include the grey literature, however, and we did not query databases other than Pubmed. But arguably studies not listed on Pubmed are no different with respect to our main finding.
Another limitation is that our information on population characteristics was only for age and urbanicity. We had limited information on relevant age bands and no information on gender, ethnicity or other socio-economic or biological risk factors. Despite this limitation, including age and urbanicity as covariates in our final regression model explained 91% of between-study variation. This may be due to the homogenous selection of studies (all from Northern Europe), which was helpful to demonstrate the specific contribution of design factors. But our findings underestimate the contribution of other population characteristics as a source of variation.
Finally, in our statistical model, we did not account for interactions between study design characteristics and population characteristics. Such interactions are plausible, e.g. older, non-migrant females with mild symptoms are less likely to be included in first-contact studies than young migrant males with acute onset of psychosis (Hogerzeil et al., 2017). In our analyses, the net effect would be conservative, e.g. selection bias in first-contact studies in favor of including subjects with higher incidence of schizophrenia would shrink the contrast with ePCRs observed in this study.

Conclusions
In conclusion, our selective review demonstrates that differences in study design explain most of the wide variation in reported estimates of the incidence of schizophrenia. This artefact can eclipse true but smaller variations in population risk factors such as gender, age and migrant status. To distinguish cause from noise, future systematic reviews should apply standardized categorizations by type of design (Hogerzeil and van Hemert, 2019;Edwards et al., 2019).

Declaration of competing interest
None.