Assessing the age- and gender-dependence of the severity and case fatality rates of COVID-19 disease in Spain

Background: The assessment of the severity and case fatality rates of coronavirus disease 2019 (COVID-19) and the determinants of its variation is essential for planning health resources and responding to the pandemic. The interpretation of case fatality rates (CFRs) remains a challenge due to different biases associated with surveillance and reporting. For example, rates may be affected by preferential ascertainment of severe cases and time delay from disease onset to death. Using data from Spain, we demonstrate how some of these biases may be corrected when estimating severity and case fatality rates by age group and gender, and identify issues that may affect the correct interpretation of the results. Methods: Crude CFRs are estimated by dividing the total number of deaths by the total number of confirmed cases. CFRs adjusted for preferential ascertainment of severe cases are obtained by assuming a uniform attack rate in all population groups, and using demography-adjusted under-ascertainment rates. CFRs adjusted for the delay between disease onset and death are estimated by using as denominator the number of cases that could have a clinical outcome by the time rates are calculated. A sensitivity analysis is carried out to compare CFRs obtained using different levels of ascertainment and different distributions for the time from disease onset to death. Results: COVID-19 outcomes are highly influenced by age and gender. Different assumptions yield different CFR values but in all scenarios CFRs are higher in old ages and males. Conclusions: The procedures used to obtain the CFR estimates require strong assumptions and although the interpretation of their magnitude should be treated with caution, the differences observed by age and gender are fundamental underpinnings to inform decision-making.


Introduction
The coronavirus disease 2019 (COVID-19) has spread to nearly every country in the world since it first emerged in the Hubei province of China in 2019. As of 14 May 2020, more than 4.22 million cases and more than 290,000 deaths have been reported worldwide 1 . While people of any age may get infected, COVID-19 symptoms are particularly severe for the elderly and those with underlying health conditions, which creates a disproportionate risk and need for intensive care in these groups. Understanding the severity of the disease in the different population groups is essential to help predict the demand of healthcare resources and to design effective mitigation policies.
Case fatality rates (CFR) are often used to characterize the severity of the disease. The crude CFR is obtained by dividing the cumulative number of deaths by the cumulative number of reported cases. This indicator is simple to calculate but is difficult to interpret due to different biases 2 . First, the clinical outcome (recovery or death) of the most recent cases may be unknown due to the delay between disease onset and death which may underestimate the true CFR. Moreover, limited capabilities in testing result in most of people tested being only those with the most severe symptoms and most likely to experience fatal outcomes. As a result, crude CFRs may overestimate rates that are defined based on the actual number of infected people (including those with weak or no symptoms).
Crude fatality rates can be adjusted in a number of ways to obtain estimates that more accurately represent the severity of the disease in each of the population groups. For example, censoring can be taken into account by using the distribution of the time between disease onset and death to determine the number of cases that could experience an outcome by the point in time when the rates are calculated. The under-ascertainment of  in different groups can also be corrected by using the population demographics.
Here, we calculate crude and adjusted fatality rates by age group and gender in Spain. Spain is one of the hardest-hit countries in the pandemic with 272646 cases and 27321 deaths as of 14 May 2020. The country is characterized by one of the longest life expectancies and lowest birth rates in the world 3 and, thus, has a large percentage of older adults. Moreover, it is characterized by a sociable lifestyle and extensive inter-generational interactions which may accelerate the spread of the virus. Accurate assessment of CFRs by age group and gender is essential to help planning responses that help save lives.
First, we present the data on population, confirmed cases and deaths of Spain. We then demonstrate how to calculate crude and adjusted CFRs by population group and present the estimates for Spain. We discuss the limitations of the methods and conduct a sensitivity analysis where we compare CFRs adjusted under different assumptions.

Data
Population data for Spain stratified by age group and gender for 2019 are obtained from the National Institute of Statistics of Spain 4 (Figure 1). We note the large percentage of older adults with over-60 males and females representing 11.41% and 14.16% of the whole population, respectively.
Data on the daily total confirmed cases and deaths, as well as daily confirmed cases and deaths by age group and gender from a subset of the population are reported by the Spanish Ministry of Health and provided by 5. Assuming this subset is representative of all cases, in terms of the relative distribution among age group and gender, we can estimate the daily number of confirmed cases in each group by multiplying the total number of cases by the proportion of cases in each group. Daily number of deaths in each age group and gender are calculated following a similar procedure. Figure 2 and Figure 3 show the proportion of cases and deaths, respectively, in each age group and gender. We observe a low proportion of confirmed cases in young people (under 20 years old) and a high proportion of deaths in older age groups and males. Figure 4 shows the total number of confirmed cases and deaths over time.

Relative risk
We can examine the relative risks in each age group and gender to compare the severity of the disease between population groups. The relative risk in each population group is obtained by dividing the number of deaths in a group by the total population   in that group, and normalizing the values so the risk of males older than 80 is equal to 1. We observe a roughly tenfold increase in risk for every 20 year increase in age, consistent with an earlier smaller study of cases in China 6 .
Case fatality rate At any point in time, the crude CFR is calculated by dividing the cumulative number of deaths by the cumulative number of reported cases. As noted, CFRs may be affected by preferential ascertainment of severe cases. This is likely to occur in COVID-19 where cases asymptomatic or with mild symptoms are less likely to seek medical care or be included in the surveillance data. This could result in an upward bias (or overestimate) of the crude CFRs by under-reporting of cases. We can partially correct this bias by calculating the adjusted daily number of confirmed cases following the procedure detailed in 6. Specifically, we calculate NC a = pop a /cases a where pop a and cases a are the population and the number of cases, respectively, in group a, a ∈ { males 0-9, males 10-19, males 20-29, males 30-39, males 40-49, males 50-59, males 60-69, males 70-79, males 80+, females 0-9, females 10-19, females 20-29, females 30-39, females 40-49, females 50-59, females 60-69, females 70-79, females 80+ }. We assume perfect ascertainment in the group with maximum 1/NC a value which is the group of males older than 80. Then, we assume the attack rate is the same in all groups and estimate the adjusted number of cases in each population group by multiplying the confirmed cases by NC a /NC males 80+ . Figure 6 and Figure 7 show the cumulative confirmed cases and the cases adjusted for preferential ascertainment over time for each age group and gender. Finally, we calculate the CFRs adjusted for preferential ascertainment by dividing the cumulative number of deaths by the cumulative number of adjusted cases in each population group. We also calculate 95% confidence intervals using exact binomial tests 7 .
CFRs can also be biased due to the delay between disease onset and death. At any moment in time, the cumulative number of confirmed cases includes people who have not yet died but may do so in the future. Therefore, crude fatality rates may underestimate the true severity of the disease. We can correct this bias by replacing the denominator with an estimate of the cumulative number of cases with known outcomes by the time rates are calculated. Specifically, we adjust for this bias as follows. Let T be the point in time when the CFRs are calculated. The probability that a case confirmed at time t, t = 1, . . . , T, has a known outcome by time T is expressed Here we calculate the number of adjusted cases assuming a log-normal distribution of the time from disease onset to death with mean equal to 13 days and a standard deviation equal to 12.7 8 ( Figure 5). Figure 6 and Figure 7 show cumulative cases adjusted for preferential ascertainment of severe cases and time delay between confirmation and death for each population group. Then we calculate corrected CFRs using the adjusted cases as denominator and 95% confidence intervals using an exact binomial test.

Sensitivity analysis
The procedure we used to obtain adjusted CFRs requires strong assumptions that could greatly affect the results. First, we have adjusted crude CFRs by preferential ascertainment of severe cases by assuming complete ascertainment in the group with the highest attack rates (males older than 80). We then have assumed a uniform attack rate in all population groups, and used demography-adjusted under-ascertainment rates to obtain estimates of the number of infected individuals in each population group. However, there could also be under-ascertainment in the males older than 80 group due to extensive strain on the health system, and this fact could mean the CFR estimates are only an upper bound on the real values. We could correct this bias by further scaling the number of cases after the initial demographic adjustment. For example, we could multiply the adjusted cases by a value α > 1 to obtain a higher number of infected cases and lower CFRs. Moreover, the uniform attack rate assumption could be incorrect if certain population groups have more interactions with other people and are more exposed to the disease.
CFRs may also be biased due to the delay between disease onset and death. To correct this bias, we have considered a log-normal distribution with mean 13 days and standard deviation 12.7 days for the time from disease onset to death 8 , and estimated the CFRs using as denominator the cumulative number of cases that could have a clinical outcome by the time rates are calculated. However, other distributions may be considered that could change the results.
To illustrate these limitations, we conduct a sensitivity analysis where we calculate the CFRs using different levels of ascertainment and different distributions for the time from disease onset to death. Specifically, we estimate the adjusted number of cases in each population group by multiplying the confirmed cases by NC a /NC males 80+ × α using α values equal to 1, 1.5 and 2. We also use delay distributions equal to a log-normal distribution with mean 13 days and standard deviation 12.7 days 8 (Figure 5), and a Gamma with mean 18.8 days and coefficient of variation 0.45 days 9 (Figure 9).

Analysis
Analysis are performed with the statistical software R version 3.6.1 10 . Plots are created with the R package ggplot2 version 3.3.0 11 . Figure 8 shows the relative risks in each age group and gender. We note the risk of COVID-19 increases with age and is higher for males than for females for all age groups except 0-9 and 10-19. Table 1 shows the crude and adjusted CFRs by age group and gender calculated on 14 May 2020. This table also shows the CFRs by age group obtained from aggregated time series of cases in mainland China by Verity et al. 6 . We observe CFRs are much higher in age groups older than 60 and, for most age groups, in males. We observe the adjusted CFRs obtained with the data from Spain are smaller than the CFRs obtained by Verity et al. 6 for all except the oldest two groups, and the confidence intervals for the CFRs of Spain are much     smaller due to the use of a larger dataset. Table 2 shows the CFRs estimated under different scenarios assuming different levels of ascertainment and distributions for the time from disease onset to death. We observe that in all scenarios CFRs are higher in older age groups and males but yield different values for the CFRs.

Discussion
In a newly emerging infectious disease like COVID-19 data are assembled in challenging circumstances that may contribute to the underestimation of cases and deaths. Data available on the total confirmed cases and deaths in Spain do not provide age and gender information. Here, we have obtained estimates by population group by multiplying the total confirmed cases and deaths by the proportions occurring in each group of a sample with that information. This is a limitation of our study since it is possible that the sample with demographic information may not be representative of the whole population.
We have seen that the approach of estimating crude CFRs by dividing the total number of deaths by the total number of confirmed cases produce results that are difficult to interpret due to several biases. For example, the estimated rates may overstate the true rates due to preferential inclusion of severe cases since data assembled during emergency settings typically contain people who seek medical care, have the most severe symptoms, and experience fatal outcomes. Following Verity et al. 6 we have adjusted by preferential ascertainment of severe cases by assuming complete ascertainment in the group with the highest attack rates, and using demography-adjusted under-ascertainment rates to estimate the number of infected individuals in each population group. In addition, CFRs may also be biased due to the delay between disease onset and death. We have adjusted for this bias by considering a specific distribution for the time from disease onset to death. These are strong assumptions that could greatly affect the results. We conducted a sensitivity analysis where we calculated the CFRs using different levels of ascertainment and different distributions for the time from disease onset to death. The sensitivity analysis yielded different values for the CFRs, and in all scenarios CFRs were higher in older age groups and males.
In addition, CFRs calculated in the initial phase of an epidemic are highly dependant of the point in time they are calculated.
Here we provide estimates calculated with data from 14 May but rates calculated at a later point in time could be different.

Conclusions
The assessment of the severity of COVID-19 and the determinants of its variation is essential for planning health resources and the design of mitigation policies, including intelligent strategies to release population from confinement while protecting the most vulnerable. In this article we have estimated CFRs by age group and gender in Spain accounting for censoring and ascertainment biases. We have found that COVID-19 is highly influenced by age and gender with higher rates in older ages and males. The procedures used to obtain the CFR estimates require strong assumptions and although the interpretation of their magnitude should be treated with caution, the differences observed by age and gender are fundamental underpinnings to inform decision-making.

Data availability
Source data Data on total confirmed cases and deaths, as well as confirmed cases and deaths by age group and gender from a subset of the population are reported by the Spanish Ministry of Health and provided by 5. Population data for Spain are obtained from the National Institute of Statistics of Spain 4 . This project contains the following underlying data:

Software availability
Code for the results, figures and tables of this study can be found at https://github.com/Paula-Moraga/coronavirus-cfr