Introduction

The case fatality rate (CFR), that is, the ratio of reported deaths to diagnosed infections of severe acute respiratory syndrome coronavirus (SARS-CoV-2), has been one of the main measures used for monitoring disease progression and evaluating the effectiveness of mitigation strategies (including vaccination) over the course of the COVID-19 pandemic. Since early on, estimates of COVID-19 CFR have varied substantially across countries, however (Rajgor et al., 2020).

International comparisons of the CFR are biased by several factors. First, there is a delay between symptoms onset, diagnosis and death or recovery, as it has been demonstrated for other emerging pathogens like Ebola and SARS (Ghani et al., 2005; Lipsitch et al., 2015; Wu et al., 2020). Another source of bias is that many SARS-CoV-2 infections are mild or remain asymptomatic, and thus avoid detection (Li et al., 2020). At the same time, infections in older age groups are over-represented (Fisman et al., 2020), because COVID-19 severity increases with age (Goldstein & Lee, 2020; Green et al., 2020; Guilmoto, 2020; O’Driscoll et al., 2020). Finally, recording COVID-19 as the cause of death varies across countries as well (Goldstein & Lee, 2020; Garcia et al., 2021).

Assessing and correcting for these biases when comparing COVID-19 CFR across countries has generated a vast literature. Beginning with Dowd et al., (2020), demographers have identified the role of population age structure for understanding levels and differentials of the CFR associated with COVID-19 (Dudel et al., 2020; Medford & Trias-Llimòs, 2020; Morwinski, Nitsche and Acosta, 2021). Demographic studies have also revealed that the age structure of confirmed SARS-CoV-2 infections, rather than population age structure, is the most important source of bias in international comparisons of the CFR, as it can account for up to two-thirds of the observed variation in the CFR across countries (Dudel et al., 2020; Sudharsanan et al., 2020; Morwinski; Nitsche and Acosta, 2021). To adjust for the age structure of COVID-19 cases in international comparisons, demographers have followed a classic approach in mortality analysis (Heuveline & Tzen, 2021) and relied on direct standardisation by age of the CFR (Dowd et al., 2020; Goldstein & Lee, 2020; Green et al., 2020; Guilmoto, 2020; Sudharsanan et al., 2020). The main disadvantage of age-standardised CFRs is that they vary depending on the choice of the standard population (Heuveline & Tzen, 2021).

To address the limitations of direct standardisation of the CFR, in this paper we adopt an alternative demographic approach to improve the comparability of the CFR across countries. Specifically, we propose and validate a synthetic indicator of COVID-19 fatality, the synthetic case fatality rate (SCFR). The SCFR does not require the arbitrary choice of a standard population and it has the additional advantage that it can be calculated separately for men and women, thus simultaneously adjusting for the age and sex distribution of COVID-19 cases. By doing so, we demonstrate how sex differences in COVID-19 fatality are the main driver of observed differences in the CFR across countries.

Introducing the synthetic case fatality rate

Demographically, the CFR is a crude rate, which measures the frequency of COVID-19-related deaths in the population of individuals infected with SARS-CoV-2 without regard to its age and sex structure. Given the well-established association between age, sex and COVID-19 infection and mortality (Goldstein & Lee, 2020; Green et al., 2020; Guilmoto, 2020; O’Driscoll et al., 2020), comparisons of the crude CFR across countries are biased by the age and sex composition of the population of individuals infected with SARS-CoV-2, as it is the case for other crude demographic rates (Ní Bhrolcháin, 2001; Preston et al., 2001).

Existing studies have adjusted for the age distribution of COVID-19 cases in the calculation of the CFR with the tool of direct standardisation by age, whereby age-specific fatality rates from different populations are applied to a standard population (Dowd et al., 2020; Goldstein & Lee, 2020; Green et al., 2020; Guilmoto, 2020; Sudharsanan et al., 2020). These studies, however, have obtained quite different results. Indeed, the arbitrary choice of the standard population, which is the inherent limitation of direct standardisation, is particularly problematic when choosing a standard age distribution of COVID-19 cases (Heuveline & Tzen, 2021).

An additional limitation of direct standardisation of the CFR is that it disregards the role of gender for differences in COVID-19 fatality across countries. This is an important omission for several reasons. First, there are more confirmed cases of COVID-19 among women than men under age 60 and above age 90, while the reverse is true between age 60 and 90 (Salje et al., 2020; Rochon et al., 2020; Stall et al., 2020; Bignami-Van Assche, 2021).Footnote 1 The second reason why the sex structure of COVID-19 cases introduces an additional bias in international comparisons of COVID-19 CFR is that sex differentials in COVID mortality are more pronounced than all-cause mortality (Geldseltzer et al., 2021). COVID-19 mortality rates of men aged 25–80 have been found to be up to two times greater than those of women (Geldseltzer et al., 2021; Guilmoto, 2020), because of a combination of biological and behavioral factors (Cai 2020; Falahi & Kenarkoohi, 2020; Gebhard et al., 2020; Klein et al., 2020; Scully et al., 2020; Takahashi et al., 2020).

Calculating a synthetic indicator is the well-established alternative to standardisation when adjusting for compositional effects in crude demographic rates (Ní Bhrolcháin, 2001). To address the limitations of direct standardisation for international comparisons of the CFR, we thus propose a synthetic indicator of COVID-19 fatality, the synthetic CFR (SCFR). The SCFR is obtained by summing over all age groups the age-specific CFRs that apply in a given period:

$$\text{SCFR} = n \sum \text{CFR}_{a, a+n}\;\;\;\; \;\; 0 \leq a \leq 90 +$$

with n being the width of the age intervals. Each age-specific CFR is computed as the ratio between the number of COVID-19 deaths and the number of diagnosed SARS-CoV-2 infections in a given age group during the observation period. The SCFR is thus a cross-sectional indicator that can be interpreted as the average number of fatalities per 100 diagnosed SARS-CoV-2 infections over a given period. It can be easily computed from aggregate data on COVID-19 cases and deaths by 10-years age groups, the standard age interval used in official statistics. Compared to age-standardised CFRs in existing studies, not only does the SCFR not require the choice of an arbitrary standard population but also, when calculated for men and women separately, it simultaneously adjusts for the age and sex structure of COVID-19 cases.

In the next section, we give an example of the calculation of the SCFR with aggregate data on COVID-19 cases and deaths from COVerAge-DBFootnote 2 (Riffe et al., 2021) for a selected number of developed countries. Then, we show how the SCFR can be validated with individual-level data on SARS-CoV-2 infections. The countries selected for the example (Germany, Italy, the Netherlands, Portugal, Spain, and Canada) were chosen because of similarities in the pandemic progression and mortality patterns during the first wave (Ghio et al., 2021) as well as the availability of the data necessary for the validation.

Empirical application with aggregate data on COVID-19 cases and deaths

We begin by calculating the SCFR with aggregate data for both sexes combined. Then, we show the additional insights gained when adjusting simultaneously for the age and sex structure of COVID-19 cases.

Adjustment for the age structure of COVID-19 cases. In Fig. 1, we compare the SCFR and the crude CFR at the end of the first pandemic wave in Germany, Italy, the Netherlands, Portugal, Spain, and Canada.

Fig. 1
figure 1

COVID-19 case fatality rate (CFR) and synthetic case fatality rate (SCFR) in six European countries and Canada. Legend: CFR: total number of deaths per 100 diagnosed SARS-CoV-2 infections. SCFR: average number of deaths per 100 diagnosed SARS-CoV-2 infections. Sources: Data on COVID-19 cases and deaths by 10-year age groups from COVerAge-DB (Riffe, Acosta et al., 2020) for the period February 1, 2020 – June 30, 2020.

The crude CFR identifies large differences in COVID-19 fatality across countries, with the first pandemic wave seemingly least fatal in Portugal and most fatal in Italy. Once we adjust for the age structure of COVID-19 cases with our proposed indicator, a quite different picture emerges.

First, the SCFR indicates a different raking of countries in terms of COVID-19 fatality. Sweden has a crude CFR approximately half as large as Italy (7.8 vs. 14.5 deaths per 100 diagnosed SARS-CoV-2 infections, respectively) and similar to Canada (7.8 vs. 8.1 deaths per 100 diagnosed SARS-CoV-2 infections, respectively). Yet it has a SCFR that ranks it as the second country after Italy where COVID-19 has been most fatal. Furthermore, Spain records a lower SCFR than Sweden, while it has one of the highest crude CFR together with Italy and the Netherlands.

Second, overall differences in COVID-19 fatality across countries according to the SCFR are less stark than suggested by the crude CFR, especially within the group of high-fatality countries (Italy, Netherlands, Spain, Sweden, and Canada). At the other end of the spectrum, Portugal and Germany have the lowest age-specific CFRs and crude CFR but, once the age structure of COVID-19 cases is taken into account, the fatality of COVID-19 in Portugal results are much lower than in Germany (the average number of deaths per 100 diagnosed SARS-CoV-2 infections is 5.6 vs. 8.0, respectively).

Third, the age pattern associated with high fatality is not unique (Fig. 2). The high SCFRs in Italy and the Netherlands result from the highest age-specific CFRs between age 50 and 79. In Sweden and Canada, a similarly high SCFR results from the highest age-specific CFRs above age 80.

Fig. 2
figure 2

Age-specific COVID-19 case fatality rate in six European countries and Canada. Sources: Data on COVID-19 cases and deaths by 10-year age groups from COVerAge-DB (Riffe, Acosta et al., 2020) for the period February 1, 2020 – June 30, 2020.

Adjustment for the age and sex structure of COVID-19 cases. Figure 3 compares the SCFR and crude CFR calculated separately for men and women. Like the overall adjustment (see Fig. 1), for men the SCFR is lower than the crude CFR in Italy, Netherlands, and Germany. Figure 4 shows that this is because these countries have the highest age-specific CFRs above age 60 and that the effect is smaller for women than men because in Italy, Netherlands, and Germany the age-specific CFRs are highest only between age 70 and 90 but converge at age 90+.

Fig. 3
figure 3

COVID-19 case fatality rate (CFR) and synthetic case fatality rate (SCFR) in five European countries and Canada, by sex. Legend: CFR: total number of deaths per 100 diagnosed SARS-CoV-2 infections. SCFR: average number of deaths per 100 diagnosed SARS-CoV-2 infections. Sources: Data on COVID-19 cases and deaths by 10-year age groups and sex from COVerAge-DB (Riffe, Acosta et al., 2020) for the period February 1, 2020 – June 30, 2020.

Fig. 4
figure 4

Age-specific COVID-19 case fatality rate in four European countries and Canada, by sex. Sources: Data on COVID-19 cases and deaths by 10-year age groups and sex from COVerAge-DB (Riffe, Acosta et al., 2020) for the period February 1, 2020 – June 30, 2020.

The most interesting finding from Fig. 3 is that, once we adjust for the compositional effect introduced by the age structure of COVID-19 cases, sex differences are larger than country differences in COVID-19 fatality. In all countries considered, the SCFR is always much higher for men than for women (1.5 to two times higher). The higher case fatality for men thus emerges as the main driver of observed differences in COVID-19 fatality across countries. Indeed, the contribution of sex-specific SCFR to the overall SCFR depends on the combination of age-specific CFRs (see Fig. 4) and the ratio of male-to-female diagnosed cases at each age (not shown). The former is always much higher for men than for women, minimising the effect of the latter – which favors men at working ages, reverses between age 60 and age 90, and finds women again at a disadvantage above age 90. This age pattern of the ratio of male-to-female diagnosed cases has been suggested to be due to women’s higher share in health- and care-related occupations at working ages compared to men, the higher prevalence of chronic conditions among older men, and women’s higher life expectancy and resulting share in elderly homes (Sobotka, 2020).

Empirical validation with individual-level data on SARS-CoV-2 infections

Like in all existing estimates of COVID-19 CFR and age pattern of fatality and mortality (Goldstein & Lee, 2020; Guilmoto, 2020; Shapiro, 2020; Sasson, 2020), in our above example we have used cross-sectional, aggregate data on COVID-19 cases and deaths. In these data, events counted in the numerator (deaths) do not correspond to the individuals exposed to the risk of dying counted in the denominator (confirmed SARS-CoV-2 infections). To validate our proposed adjustment to the CFR, we exploited individual-level data on confirmed SARS-CoV-2 infections collected and harmonised by the European Center for Disease Control (ECDC, 2020). These data are longitudinal and allow correctly calculating age-specific CFRs as occurrence/exposure rates. It follows that, when the appropriate survival model is fitted to the data controlling for age and sex, the estimated cumulative risk of dying should correspond to our synthetic CFR.

The first step of the validation was to determine the appropriate parametrisation of the survival model to be fitted to the data. For each SARS-CoV-2 infection confirmed between February 1 and June 30, 2020 in Germany, Italy, the Netherlands and Sweden, we had access to anonymised individual information on age, sex and clinical outcome (alive, died, still in treatment or unknown).Footnote 3 By estimating the risk of dying given SARS-CoV-2 infection in each country controlling for age and sex we found that, between age 30 and 80, the Gompertz distribution provides the best fit, as suggested by earlier studies (Goldstein & Lee, 2020; Guilmoto, 2020; Shapiro, 2020; Sasson, 2020). In Germany, Italy and Sweden, the risk of dying given confirmed SARS-CoV-2 infection increases by 11% every year of age; in the Netherlands, by 13%. Estimated values show, however, a greater variability by sex, ranging from 10 to 14% for men, and from 12 to 14% for women.

The second step of our validation of the SCFR consisted of fitting a Gompertz parametric survival model (with age at death as duration variable) to pooled data for all four countries, while controlling for intragroup correlation at the national level with fixed effects (estimated hazard ratios and corresponding standard errors are available upon request). The results in Table 1 show that, as we had expected, the estimated cumulative risk of dying given SARS-CoV-2 infection is very similar to the SCFR calculated from aggregate data.

Table 1 Cumulative risk of dying given SARS-CoV-2 infection estimated with fixed-effects Gompertz model on individual-level data and synthetic case-fatality risk (SCFR) calculated from aggregate data on COVID-19 cases and deaths, by sex

The small observed differences between the estimated cumulative risk of dying and the SCFR arise because of the nature of the two data sources used in their calculation. The aggregate data used to calculate the SCFR from COVerAge-DB generally correspond to the official counts of COVID-19 cases and deaths compiled by national health agencies (Riffe et al., 2021). On the contrary, the individual-level data collected and compiled by ECDC refer to diagnosed SARS-CoV-2 infections and their clinical outcome. The aggregate data will thus include fatalities that were attributed to COVID-19 without prior testing for SARS-CoV-2.Footnote 4 Indeed, when we compared the age- and sex-specific CFRs calculated from the individual-level data with those calculated from the aggregate data (not shown), we found discrepancies between the two sources mainly above age 80, where a large proportion of deaths in nursing homes are found.

Discussion and conclusion

The severity profile of a novel pathogen is one of the most critical clinical and public health issues, since assessing disease progression and outcomes is crucial for planning health interventions and assessing their efficacy (Ghani et al., 2005; Lipsitch et al., 2015). For this reason, the CFR remains one of the key indicators used for monitoring COVID-19 outbreaks and evaluating appropriate policy health measures, including vaccination.

Existing studies indicate that the main source of bias in comparisons of COVID-19 CFR across countries is the different age distribution of COVID-19 cases (Dudel et al., 2020; Green et al., 2020; Morwinski, Nitsche and Acosta, 2021). In this paper, we propose a synthetic indicator of the CFR, the SCFR, that improves the international comparability of the CRF in two main ways. First, it adjusts for the age structure of COVID-19 cases without relying on the arbitrary choice of a standard population, like existing studies do (Dowd et al., 2020; Goldstein & Lee, 2020; Green et al., 2020; Guilmoto, 2020; Sudharsanan et al., 2020). In addition, the SCFR can be calculated separately for men and women, thus adjusting simultaneously for the age and sex structure of COVID-19 cases in cross-country comparisons.

Contrary to what the crude CFR would suggest, differences in COVID-19 fatality across countries according to the SCFR are not very stark. Indeed, sex differences are larger than country differences in COVID-19 fatality, and the higher case fatality for men emerges as the main driver of observed differences across countries in the overall CFR. The empirical validation of the SCFR for a selected number of countries, where individual-level datasets on SARS-CoV-2 infections and related fatalities are available, confirms that the age and sex structure of COVID-19 cases is the main factor responsible for observed differences in COVID-19 fatality across countries.

As the pandemic is still evolving and accurate cause-of-death data remains incomplete, our synthetic indicator is easily applicable and useful for monitoring the fatality of SARS-CoV-2 mutations and the efficacy of public health measures, including vaccination.