Calibration adjustments to address bias in mortality analyses due to informative sampling—a census-linked survey analysis in Switzerland

Background Sampling bias, like survey participants’ nonresponse, needs to be adequately addressed in the analysis of sampling designs. Often survey weights will be calibrated on specific covariates related to the probability of selection and nonresponse to get representative population estimates. However, such calibrated survey (CS) weights are usually constructed for cross-sectional results, but not for longitudinal analyses. For example, when the outcome of interest is time to death, and sampling selection is related to time to death and censoring, sampling is informative. Then, unweighted or CS weighted inferential statistical analyses may be biased. In 2010, Switzerland changed from a decennial full enumeration census to a yearly registry-based (i.e., data from harmonised community registries) and a survey-based census system. In the present study, we investigated the potential bias due to informative sampling when time to death is the outcome of interest, using data from the new Swiss census system. Methods We analysed more than 6.5 million individuals aged 15 years or older from registry-based census data from years 2010 to 2013, linked with mortality records up to end of 2014. Out of this population, a target sample of 3.5% was sampled from the Swiss Federal Statistical Office (SFSO) in a stratified yearly micro census. The SFSO calculated CS weights to enable representative population estimates from the micro census. We additionally constructed inverse probability (IP) weights, where we used survival information in addition to known sampling covariates. We compared CS and IP weighted mortality rates (MR) and life expectancy (LE) with estimates from the underlying population. Additionally, we performed a simulation study under different sampling and nonresponse scenarios. Results We found that individuals who died in 2011, had a 0.67 (95% CI [0.64–0.70]) times lower odds of participating in the 2010 micro census, using a multivariable logistic regression model with covariates age, gender, nationality, civil status, region and survival information. IP weighted MR were comparable to estimates from the total population, whereas CS weighted MR underestimated the population MR in general. The IP weighted LE estimates at age 30 years for men were 50.9 years (95% CI [50.2–51.6] years), whereas the CS weighted overestimated LE by 2.5 years. Our results from the simulation study confirmed that IP weighted models are comparable to population estimates. Conclusion Mortality analyses based on the new Swiss survey-based census system may be biased, because of informative sampling. We conclude that mortality analyses based on census-linked survey data have to be carefully conducted, and if possible, validated by registry information to allow for unbiased interpretation and generalisation.


INTRODUCTION
Population-based longitudinal studies are a key tool for epidemiologists to investigate individual and ecological risk factors on disease development and mortality over time. In many countries, including the United Kingdom, Germany and the Nordic countries, nationwide cohort studies have been established by linking information from population registries and secondary data sources (Olsen et al., 2001;Ahrens et al., 2014;Connelly & Platt, 2014). In Switzerland, a nationwide cohort study was established in 2008, based on a probabilistic record linkage of census and mortality information covering the whole Swiss population (Bopp et al., 2009;Spoerri et al., 2010). This cohort study relied on an almost complete census (coverage of 98.6% (Renaud, 2004)) and allowed for almost unbiased mortality analyses (Schmidlin et al., 2013;Moser et al., 2014).
In 2010, Switzerland changed from a full enumeration census-which was conducted every 10 years-to a registry-based and survey-based census system. This new Swiss census system is based on a yearly updated registry-based census-which collects information from harmonised community registries covering the entire resident population of Switzerland (STATPOP)-and a micro census of roughly 3.5% of the resident Swiss population aged 15 years or older, living in a private household. This micro census collects individual and household information, which is not available in the STATPOP population (e.g., attained educational level). The new census system aims to be more cost-efficient and to require less administrative resources compared to the traditional full enumeration census. However, because the micro census is based on only a stratified sampled subpopulation, the sampling design needs to be addressed in descriptive and inferential statistical analyses to get valid population estimates (Korn & Graubard, 1999;Fuller, 2009).
The Swiss Federal Statistical Office (SFSO) planned and performed the sampling of the micro census, and provided calibrated survey (CS) weights for the analysis of the micro census sample. CS weights address the sampling process (i.e., probabilities of selection) and account for sampling biases (e.g., nonresponse adjustments) to get consistent statistical estimates for the whole Swiss population in weighted descriptive analyses (Assoulin, 2012). These analyses are valid for cross-sectional results, but not necessary for longitudinal analyses. For example, if the outcome of interest is time to death and the sampling selection is not independent of the survival time and censoring, given other covariates, the sampling design is called ''informative'' (Lawless, 2003;Boudreau & Lawless, 2006). When sampling is informative, the outcome of interest is not missing at random and unweighted or CS weighted inferential statistical analyses may be biased. IP weighting models, which account for the probability of survey inclusion and censoring, could correct for such bias and lead to valid survival estimates (Scharfstein & Robins, 2002;Little, 2004;Boudreau & Lawless, 2006).
In the present study, we performed time to death analyses using data from the Swiss micro census. Specifically, we calculated mortality rates (i.e., survival time is an offset variable in a Poisson regression model) and life expectancy (i.e., survival time is directly modelled in a censored skew-normal regression model (Moser, Clough-Gorr & Zwahlen, 2015)). In Switzerland, non-responders of surveys have a higher risk of dying compared to responders (Bopp, Braun & Faeh, 2014). Thus, we hypothesised that bias due to informative sampling may exist in CS weighted analyses, caused by unmeasured predictors (e.g., health status), which are possibly related to the sampling process and mortality. We therefore compared CS weighted results with those from an IP weighting approach, which uses known sampling covariates and survival information, and validated our findings using data from the total Swiss population. Additionally, we performed a simulation study to investigate the bias due to informative sampling under different sampling and nonresponse scenarios.

Data availability
We provide an anonymised dataset of the STATPOP and SE populations of the year 2010 to allow reproducibility of our main results. By law, we are not allowed to provide exact dates (date of birth and date of death). Thus, we rounded necessary main analysis variables (age and follow-up time) to one decimal place. Rounding had no influence on obtained results, compared to results using exact date information. Results in the manuscript are based on exact information. Analysis code for all results is provided in a Supplemental File.

Human ethics
Federal laws gave the federal authorities the right to collect without consent the census and mortality data used in the SNC. Approval for the anonymous linkage in the SNC was obtained from the Ethics Committees of the Cantons of Zürich (approval no. 13/06) and Bern (approval no. 205/06).

Micro census
The micro census of the year 2010, subsequently called 'structural enquiry' (SE), is a stratified sample of the permanent residents in Switzerland aged 15 years or older, living in a private household. The sampling aim and planned accuracy of the SE has been described elsewhere (Assoulin, 2012;Qualité, Statistik & De Europe, 2014). In brief, the SE is a regionally-stratified proportional sample with a target sampling fraction of approximately 3.5%. The regions had the opportunity to increase the sample size to increase precision of statistical estimates, which led to a final sampling fraction of 5.5% (Rochat, Kauthen & Eichenberger, 2009). For the SE 2010, the response rate was 87.1% (Qualité, Statistik & De Europe, 2014). The SFSO calculated individual CS weights based on sampling inclusion and response probabilities (Assoulin, 2012;Qualité, Statistik & De Europe, 2014).

Mortality information
Survival status for the years 2011-2014 could be one-to-one linked to the STATPOP and the SE 2010 population by unique anonymous personal identifier. The percentage of non-linked death records is 0.2%. Vital status is ascertained by death certificates. We used information about vital status as binary variable (dead: yes or no) and survival time.

Simulation study
We investigated the influence of informative sampling bias in a simulation study. We simulated a population of 1,000,000 individuals with different sampling and nonresponse scenarios. Our main outcomes were number of deaths and time to death to calculate MR. We performed a simulation study as depicted in Fig. 1: we assumed an unobserved predictor (e.g., health status) and observed predictors (e.g., age and gender). On a population level, there is a direct effect of age and gender on health status, e.g., older men are more likely to have a poor health status. Further, health status has a causal effect on survival. Age and gender are related to survival through the unmeasured predictor. A random sample was drawn from the underlying population (planned sampling design). We assumed that a poor health status had a direct effect on the nonresponse, leading to a smaller effective sample. Conditioning on the effective sample opens the path between the random sample and survival (through nonresponse and the unmeasured confounder) and will lead to a spurious association in survival estimates. We assumed a constant risk of dying over time. For the effect sizes we chose an odds ratio of 4 for the association between the unmeasured confounder and nonresponse, and a 4-times higher hazard of dying for individuals with bad health, as reported in Pizzi et al. (2011). We calculated CS weights from varying probabilities of sampling and nonresponse. IP weights were constructed from the probability of 'being sampled' from the total population using vital status. We performed the following simulation scenarios: the overall percentage of deaths was set to 10% and 25% for the total population, the sampling fraction was set to 1% and 10%, and the percentage of nonresponse was set to 10% and 25%. We ran 1,000 simulation replications. We calculated MR from unweighted Poisson regression models, and models using CS and IP weights, using covariates age and gender. We reported the 1,000 replicated mortality rate estimates in scatter plots with bivariate mean location centre estimates and 95% confidence ellipses from a Gaussian distribution function.

Statistical analysis
We described the study populations by frequencies (n), percentages (%), mean and standard deviation (SD). We reported crude odds ratios of participation in the SE 2010 using logistic regression models, for variables age, gender, nationality, civil status and vital status. We calculated MR using Poisson regression models. Remaining LE at age 30 years was calculated from weighted censored skew-normal regression models, accounting for delayed entry (Moser, Clough-Gorr & Zwahlen, 2015). Because MR and LE were a priori known to differ by gender, we performed analyses for men and women, separately. We calculated IP weights using logistic regression models with an indicator of being a SE 2010 participant in the STATPOP 2010 population. Predictors were chosen according to a priori known covariates for CS weights, i.e., age, gender, nationality, civil status, region (Assoulin, 2012), and additional survival information. We included interactions between age, gender and survival information. Because the main outcome of interest is time to event we included the Nelson-Aalen estimator in the final IP weighting model (White & Royston, 2009

Sensitivity analysis
We investigated whether the hypothesised bias is present in the years 2011-2013. First, we reported crude odds ratios of participation in one of the SE 2011-2013, analysing covariates age, gender, nationality, civil status and vital status (for brevity we did not report estimates of the regional covariate). Second, we investigated whether the hypothesised bias of CS weighted analysis is similarly present in the years 2011-2013, by calculating 1-year MR. Table 1 summarises the baseline characteristics of the STATPOP population and the SE sample of 2010, by survival status. At 31 December 2010, the STATPOP population consisted of 6.7 million individuals. 49% of the population was male, 51% female. More than half of the individuals were married (51.6%) and most of the individuals were Swiss (77.3%). Until end of 2011, a total of 61,539 persons died. The 2010 SE sample consisted of 317,079 sampled individuals. The marginal distributions of the covariates for the SE sample and the STATPOP population were roughly the same, except for the regional covariate, which is a sampling stratification covariate. There were differences in individuals who died until end of 2011 in the SE sample and the STATPOP population. Individuals who died in the STATPOP population were slightly older: only 50% of the individuals were 80 years or older, compared to 57% in the STATPOP. Further, only 30% of widowed women died in the SE sample, compared to 37.5% in the STATPOP population. Table S1 summarises the marginal distributions of the baseline characteristics for CS and IP weighted estimates. The marginal distributions were comparable between the STATPOP population and the weighted estimates from the SE sample.  SE population, across months of death. Individuals who died in January and February were underrepresented in the SE, for example, only 4.8% of SE 2010 participants who died in 2011 did so in January, compared to 9.3% in the underlying STATPOP population. This pattern was consistent across all surveys 2010-2013. Figure S1 shows estimated MR for the STATPOP and SE population of 2010, with an increasing follow-up time until end of 2011, 2012, 2013 and 2014. We found that MR differences between the CS and IP weighting approaches diminished with increasing follow-up time. Figure S2 shows the estimated log MR of the simulated population versus the estimated weighted log MR of the simulated sample. The variability of the estimated weighted log MR of the sample was associated with the sampling fraction and the assumed percentage of deceased individuals. Unweighted analyses showed biased MR estimates for all performed scenarios. We found that IP weighted estimates were comparable to known CS weighted analyses, with a slightly higher bias in a scenario with a low sampling fraction and a low percentage of deceased individuals. Table S3 summarises crude odds ratios of being SE participant of the years 2011-2013, using covariates age, gender, nationality, civil status, and vital status. We found similar participation patterns as in the year 2010. Figure S3 shows the 1-year mortality analyses for the STATPOP and SE populations of the years 2010-2013, i.e., mortality in 2011-2014. CS weights showed a reduced bias for later sampling years, but still slightly underestimated the underlying population mortality rates.

DISCUSSION
We hypothesised that time to event analyses using data from the Swiss survey-based census system may biased because of informative sampling. We investigated whether an IP weighting approach leads to representative statistical estimates in the analysis of the 2010 SE sample, representing approximately 3.5% of the Swiss population aged 15 years or older at the end of 2010. The SE is a stratified random sample of the underlying population, but unweighted and CS weighted survival analyses may be biased, because the sampling is informative. We constructed IP weights, which accounted for known sampling covariates and survival information, and compared those to CS weighted MR and LE estimates. We found that CS weighted estimates were biased, whereas IP weighted estimates were comparable to those from the total Swiss population. Unmeasured predictors, which are related to future survival and the sampling processbut which are not part of the sampling design-could cause bias in the CS weighted analysis: for example, individuals with a poor health status were less likely to respond to the SE questionnaire, but had also a higher risk of dying. Such a situation may be caused by possible survey participants nonresponse, which is related to certain population characteristics (e.g., older people), which-on the other hand-are also related to the regression outcome (see Fig. 1). Bias in the regression analysis of survey data and possible strategies to handle this kind of bias have been extensively discussed in the demographical and statistical literature (see e.g., Pfeffermann, 2010 and the references therein). Pizzi et al. (2012) compared results from an internet-based cohort study sample and results from the general population and concluded that differences in estimates between the two analysis populations reflect changes in confounding pattern due to the sampling selection process. Such a change of confounding patterns is likely to influence mortality estimates in our present study using the Swiss population. Confounding patterns in our study are likely to be caused by the underrepresentation of certain subpopulations in the micro census sample. For example, we found that individuals who died in 2011 were less likely to having participated in the micro census of 2010. Specifically, we found that individuals who died in the early months of the year were less likely to participate in the micro census. Delayed information adjustments between the process of sending out questionnaires to individuals, and the alignment between mortality and registry information, could introduce specific nonresponse patterns. Further, our observed lower chance of micro census participation of non-Swiss individuals could be explained by a migrant effect (Nielsen & Knardahl, 2016). In additional analyses, we found that older foreigners tend to have a higher odd of participation in the survey, than younger individuals, which could be partly explained by the health status of these individuals (Nielsen & Knardahl, 2016). Such a migrant effect is likely to confound our current analysis.
When sampling design variables are related to survival time and censoring, unweighted survival estimates may be biased (Lawless, 2003;Boudreau & Lawless, 2006). However, IP weighting approaches, addressing the sampling design and the censoring information, have been shown to provide consistent population survival estimates (Scharfstein & Robins, 2002;Lawless, 2003;Boudreau & Lawless, 2006;Pyy-Martikainen, 2013). Several countries, also Switzerland, reported a higher risk of death among non-responders of surveys, compared to responders (Barchielli & Balzi, 2002;Bopp, Braun & Faeh, 2014). An Australian study investigated attrition patterns over five survey periods, starting in 1996-2008, in a population of older women, and reported that health status was associated with frailty, withdrawal from the study, and lost to follow-up (Brilleman, Pachana & Dobson, 2010). Such unmeasured factors, which are related to mortality and non-response, are likely leading to an informative sampling design in the new Swiss census system, and may cause bias in survival analysis of survey data. In our analyses, we found that IP weighting may adequately address the bias introduced by informative sampling. Our simulation study confirmed that IP weights, using survival information from the full population, leads to comparable MR estimates as those from known CS weights. The design of the present study-with a combination of registrybased, survey and mortality information-allowed the identification of predictors, which are associated with informative sampling, in a large population of 6.7 million individuals. This allowed us to validate our findings for consistency. Because mortality information could be deterministically linked to a registry and survey record-by using anonymous person identifiers-we could avoid uncertainty inherent to probabilistic record linkages (Schmidlin et al., 2013). The Swiss Harmonized Registry Act Revision of the year 2008 legally regulated the electronic exchange between communities and state personal registries, and led to virtually no lost-to-follow-up of individuals.
Our present study has limitations. First, the IP weighting approaches is a design-based approach, i.e., the probability distribution of participation in a random sample is treated as a fixed quantity (Hansen, Madow & Tepping, 1983;Little, 1983;Little, 2004). Designbased approaches have been shown to be only asymptotically unbiased, and potentially inefficient (Little, 2004;Kim & Skinner, 2013). In the present study we did not investigate model-based approaches, where non-participation in a random sample is predicted using regression models (Gelman, 2007;Little, 2007). Further, IP modelling relies on strong statistical assumptions which have to be fulfilled to provide unbiased statistical estimates (Scharfstein & Robins, 2002). In our setting, for example, exchangeability requires that the mortality among censored and uncensored individuals is the same in participating and not-participating individuals, given all predictors which predict mortality and those in the sampling design. However, researchers never know whether all relevant predictors were identified and collected in a real data setting. Second, our approach includes the outcome variable (i.e., survival status) in the IP weight construction. Thus, the validity of our estimates depends on the strength of the participation-outcome association and on the risk factor-outcome association (i.e., age and gender) (Pizzi et al., 2011). Third, our findings are limited to observed and available data within the registries and surveys. We could not investigate in more detail the real nonresponse patterns, i.e., which survey participants could not be contacted from the SFSO and whether sampled persons subsequently proved to have died were handled as non-responders or if they were excluded. Detailed information on those, who were contacted, but refused to participate, would strengthen our findings and could improve our estimates. Further, in the STATPOP population we could not differentiate whether individuals lived in a collective household (i.e., nursing homes) or not. This leads to a potential numerator/denominator bias, because the SE population is restricted to individuals living in a private household.
For many decades, most European countries relied on full enumeration censuses, where information for the whole population was collected at a specific reference date. A full enumeration census approach provides very detailed information on individual level (e.g., civil status, religion, household composition), but requires considerable financial and administrative resources. Registry-based and micro-census approaches are cost-efficient and require fewer resources. Among the countries of the United Nations Economic Commission for Europe (UNECE), there was an increase of registry-based and micro census approaches from 8% of all countries in 2000, to 19% in 2010(UNECE (2017, accessed (30 November 2017)). However, from a statistical perspective they are prone to the sampling design errors, including informative sampling, and require adequate analysis techniques. In the following we give examples from other studies which analysed (or plan to analyse) mortality in samples of registry or census populations. One of the largest cohort studies worldwide-the Global Burden of Disease Study-depends on a combination of complete and partially-complete (i.e., sampling information) registry and census information from different countries worldwide (Abajobir et al., 2017). To account for sampling and non-response errors, the study authors differentiate between different data sources and adjust for country-specific non-sampling error in their analysis. A study from the United Kingdom analysed mortality patterns from 1991 to 2011 from three different regions and data sources (Katikireddi et al., 2017). The analysis was based on information from the whole population of Northern Ireland, a 5.3% sample of the Scottish population, and a 1% sample of the English and Wales population. The authors calculated mortality rates stratified by sex, by estimating the number of expected deaths for the Scottish and English/Wales samples using a simulation approach. A recent study from Finland investigated migrant mortality for the years 1939-2010(Haukka et al., 2017. The authors used a 10% sample of the 1950 and 1970 Finish population censuses, and calculated all-cause and cause-specific mortality rates. The authors discuss possible selection and attrition biases in their study limitations, but did not address this kind of bias in sensitivity analyses. The German National Cohort (GNC) is a nationwide cohort study, based on a sample of 200,000 German residents, which are recruited through a network of 18 study centres (Ahrens et al., 2014). The authors estimated the anticipated response rate between 40 and 50%, but did not further specify how sampling bias will be included in their planned analyses. The above-mentioned examples show that a combination of registry-based information and population-sampling information play a crucial role in demographic and epidemiological research, and adequate analysis strategies are necessary to avoid bias due to non-response and non-participation.
In the present study, we concluded that in Switzerland-which uses a combination of registry and survey-based censuses-unweighted or CS weighted analyses lead to biased estimates, if the main outcome is survival time. The observed bias from unweighted and CS weighted survival analysis is of importance for absolute quantities (for example, mortality rates or life expectancy), but the bias may be less pronounced in relative outcomes measures (for example, hazard ratios or odds ratios). We highlight that the IP approach is not only restricted to the described setting of a change from a full enumeration census to a micro census system, but may be generalized to other settings. For example, non-response in a full enumeration census or lost-to-follow-up scenarios can be addressed by this approach to improve longitudinal validity, given the non-response or non-participation is known in the study population. The underrepresentation of population subgroups in survey data and possible selection bias have been extensively discussed in the literature, but survey-based mortality analyses often lack validation using registry data, or external data in general. Keiding & Louis (2016) exemplary discuss the limited generalisability of survey data when external validation is lacking for internet-based surveys with possible self-selection. Obviously, generalisability is more limited if the survey participation is informative.

CONCLUSION
We conclude from our study that weighted regression analysis of census-linked survey data has to be conducted carefully, and if possible, validated by registry-population information. If a validation or a registry sampling weight construction is missing, estimation results have to be interpreted with caution and might be biased.