A Bayesian estimate of the early COVID-19 infection fatality ratio in Brazil based on a random seroprevalence survey

Background:A number of estimates of the infection fatality ratio (IFR) of SARS-CoV-2 in different countries have been published. In Brazil, the fragile political situation, together with socioeconomic and ethnic diversity, could result in substantially different IFR estimates. Methods:We infer the IFR in Brazil in 2020 by combining three datasets. We compute the prevalence via the population-based seroprevalence survey, EPICOVID19-BR. For the fatalities we obtain the absolute number using the public Painel Coronavírus dataset and the age-relative number using the public SIVEP-Gripe dataset. The time delay between the development of antibodies and subsequent fatality is estimated via the SIVEP-Gripe dataset. We obtain the IFR for each survey stage and 27 federal states. We include the effect of fading IgG antibody levels by marginalizing over the test detectability time window. Results:We infer a country-wide average IFR (maximum posterior and 95% CI) of 1.03% (0.88–1.22%) and age-specific IFRs of 0.032% (0.023–0.041%) [< 30 years], 0.22% (0.18–0.27%) [30–49 years], 1.2% (1.0–1.5%) [50–69 years], and 3.0% (2.4–3.9%) [≥ 70 years]. We find that the fatality ratio in the country increased significantly at the end of June 2020, likely due to the increased strain on the health system. Conclusions: Our IFR estimate is based on data and does not rely on extrapolating models. This estimate sets a baseline value with which future medications and treatment protocols may be confronted.


Introduction
The infection fatality ratio (IFR) -the ratio between the number of deaths from a disease and the number of infected individuals (irrespective of developing symptoms) -is one of the most important quantities of any new disease. An accurate estimate of the IFR is usually a challenge before the end of a pandemic, being subject to many possible sources of biases [24,26] . Nevertheless, the IFR has direct implications on the amount of resources and effort that should be allocated to prevent the spread of the disease and on steering policymaking in general. For instance using the United States as a reference, [35] concluded that an IFR below 1% makes school closures and social distancing cost-ineffective.
In order to estimate the IFR, one needs an estimate not only of the number of deaths but also of the total infected population, and then to compare both within the same time period. It is, therefore, a difficult task, as many cases are asymptomatic or develop only mild symptoms and are often unaccounted for. It is also hampered due to the lack of testing in many countries [37] .
The total number of deaths during an epidemic can be biased by the mislabeling of undiagnosed fatalities. To circumvent this possibility, one can rely on statistical estimates from the study of excess deaths in a given period of time. In the case of COVID-19 this method is being pursued by many groups [2,43,45] , including the mainstream media, as a method which is complementary to the officially reported numbers. However, this approach invariably suffers from important modeling uncertainties [45] . This may be especially true during the current pandemic, which has seen an unprecedented amount of disruption to economic activity and social behavior, which includes a large fraction of the population undertaking social distancing measures.
One of the first detailed analyses of the IFR of COVID-19 was based on around 70,0 0 0 clinically diagnosed cases in China. After adjusting for demography and under-ascertainment, [44] arrived at the estimate of 1.38% (95% CI: 1.23-1.53%). In France, a study recently modeled both death and hospital data and estimated the IFR to be 0.5% (95% CI: 0.3-0.9%) [39] . Another model-based investigation arrived at an IFR of 0.8% (95% CI: 0.45-1.25%) [38] . A meta-analysis of 25 IFR studies found an IFR of 0.68% (95% CI: 0.53-0.82%) [31] . A compartmental model for the IFR in China and Europe using surveillance data resulted in the following estimates: Hubei, China 2.9% (95% CI: 2.4%-3.5%); Switzerland 0.5% (95% CI: 0.4%-0.6%); and Lombardy, Italy 1.4% (95% CI: 1.1%-1.6%) [19] . Finally, relying on antibody screening of Danish blood donors, the IFR was estimated to be less than 0.21% at a 95% CI [10] , although one must keep in mind that the patients were healthy and under 70 years of age, and that age is well established as the dominant determinant of mortality [8,13,32] . These results hint at the possibility of large variation in IFR values around the globe, although data from different countries were reported to be highly heterogeneous.
In Brazil, the focus of this work, the IFR was recently estimated with models. Results varied substantially between two different groups. A Brazilian team found that it should be much lower than the first estimates, at around 0.3% [7] . On the other hand, a report by a group at Imperial College London estimated much higher values for the 16 Brazilian states they considered [30] , which, combined, suggest an overall IFR of 0.9%.
The incompatible estimates above highlight the need for careful consideration of the biases in the IFR estimates for COVID-19. A possible solution to mitigate biases is to base the estimation of the IFR on a large representative random serology study in the population. One such study -conducted in Geneva, Switzerland, with 2766 participants -found that for every reported COVID-19 case there were another 10.6 unreported ones [41] , a large discrepancy which again stresses the difficulties that models have to deal with. The same group reported an IFR of 0.64% (95% CI: 0.38-0.98%) [34] . A much larger survey with 61,075 participants was conducted in Spain, but IFR estimates were not reported [36] .
A large random seroprevalence study was performed in Brazil by the EPICOVID19-BR team [16,17] which aimed to test 250 individuals in each of the 133 selected large sentinel cities. It was an extension to the whole country of a smaller regional set of surveys in the Rio Grande do Sul state [40] and has so far been carried out in four stages using the Wondfo lateral flow test for immunoglobulin M and G antibodies against SARS-CoV-2. Here, we consider data relative to the first three stages. The first stage was conducted between May 14 and 21, 2020, but did not reach its target number of samples, and in only 90 of the 133 cities at least 200 tests were performed. The total number of tests in all cities was 25,025. Round 2 was conducted from June 4 to 7 and reached over 200 tests in 120 cities. Considering all cities, a total of 31,165 individuals were tested. Round 3 was performed between June 21 and 24 and made over 200 tests in all 133 cities for a total of 33,207 tests. The total number of tests in all rounds was 89,397.
The COVID-19 pandemic has strongly affected Brazil [6] . The federal government response has been heavily criticized [22] , and, at the time of writing, the number of confirmed deaths passed half a million, second only to the USA, and among the largest in deaths per capita in the world [46] . Furthermore, strong ethnic and regional variations in hospital mortality were found, casting doubt on the availability of public healthcare for the sections of society that cannot afford private care [3,4] . This challenging situation motivates even further the need for the IFR to be estimated as accurately as possible in order to trigger an adequate political response to the crisis. Figure 1 summarizes how the three datasets used in this work are combined in order to estimate the IFR. We compute the percentage p a (t) of Brazilians that have been infected by SARS-CoV-2 at the city, state, and country levels via the EPICOVID19-BR data. We robustly correct for false positive and negative rates [14] and combine prevalences from different cities without neglecting the non-Gaussian nature of the distributions (for details, see Supplementary Materials, Section S1).

Methods
We obtain the absolute number of fatalities via the public Painel Coronavírus dataset. Painel Coronavírus is a Brazilian reference for tracking the pandemic at the federal level and provides the deaths by COVID-19 with their geographic location. We only consider deaths in the 133 sentinel cities in which EPICOVID19-BR took place. We smooth the data from Painel Coronavírus according to a forward 20-day moving average that assigns to the time t 0 the average number of deaths in the interval [ t 0 , t 0 + 19 days ] (notifications can only be delayed and not anticipated), which corresponds to shifting on average the time of deaths by 10 days earlier. This forward 20-day window makes Painel Coronavírus consistent with SIVEP-Gripe during the first months of the pandemic, when one expects that deaths at hospitals, tracked by SIVEP-Gripe, dominate the overall count (a direct comparison and details are given in Supplementary Materials, Section S2). It is also justified by estimates of the delay between time of death and notification. Indeed, for many reasons, deaths that happen at time t 0 are reported at a later time that we estimate according to a flat distribution in the above-mentioned interval [9,42] .
As Painel Coronavírus does not provide age information, we adopt the public SIVEP-Gripe dataset for the relative number of fatalities for the various age bins. We split the total population into four age groups (in years): < 30, 30-49, 50-69, and ≥ 70. The SIVEP-Gripe dataset ("Sistema de Informação da Vigilância Epidemiológica da Gripe") is a prospectively collected respiratory infection registry dataset that is maintained by the Ministry of Health since 2009 for the purposes of recording cases of Severe Acute Respiratory Syndrome (SARS) in general (and of COVID-19 in particular) across both public and private hospitals. During the current pandemic both Painel Coronavírus and SIVEP-Gripe became major sources of information on the impact of COVID-19 in Brazil.
Reports from [28] and more recently by [15,21] show that IgG levels fade in recovered patients on a timescale of a few months, which was also suggested by the results relative to the first two rounds of EPICOVID19-BR [17] . Moreover, preliminary results from the recent fourth round of EPICOVID19-BR exhibit a large decrease in seroprevalence in the country [11] , which is consistent with a limited window of detectability by the rapid test employed. For this reason we here consider a detectability window T and thus the number of fatalities relative only to such a window, which is equivalent to assuming a sharp drop of IgG levels after T days. As the IFR correlates with T , the fact that T is not precisely known could introduce an important bias in the analysis.
In order to robustly overcome this issue we treat T as a nuisance parameter to be integrated over. Specifically, based on the results of Hallal et al. [18] , we adopt a prior on T which is based on how the test sensitivity decays with time so that the marginalized distribution on the IFR is where P IFR is the distribution of IFR conditional on T , π (T ) is the prior on the detectability window T , and T max is the time between the beginning of the pandemic in Brazil and the corresponding EPICOVID19-BR round (for details, see Supplementary Materials, Section S3). Eq. (1) can be interpreted as if the conditional distribution is averaged over T with the weight given by π (T ) .
The fraction p d of the population that died due to COVID-19 in a given geographical region is defined as the ratio of the number of COVID-19 deaths to the total population in that region. For the latter we use the 2020 projections based on the 2010 census [20] . We cannot, however, compute the IFR directly from the ratio of p d and p a because, at a given time t , there are patients that developed antibodies but would only die later from the disease [5] . In order to estimate the time delay τ ad between the development of antibodies and subsequent fatality we use the SIVEP-Gripe dataset. The SIVEP-Gripe dataset contains the dates of the onset of symptoms and death for patients with a SARS-CoV-2-positive RT-PCR test, together with their geographic location and age. This information allows us to estimate the time delay τ sd between the development of symptoms and subsequent fatality for a given subset of patients. We also make use of an empirical distribution between the first symptoms and the development of antibodies [27] to estimate the mean time delay τ sa between both events. Together, these estimates allow us to obtain the time delay τ ad τ sd − τ sa . For the whole Brazil we find τ ad 9 . 7 days. Table 1 summarizes all the estimated time delays which are used in our calculations (for details, see Supplementary Materials, Section S2).

Figure 2 and
Our estimations of the IFR for Brazil are given in Table 2 and Figure 3 ; those for the individual states are given in Figure 4 . Figure 5 also shows the combined IFR but in a choropleth map. The maps for the individual rounds are shown in Supplementary Materials, Section S4. The numerical results for all the states and for the three rounds separately can be found in Supplementary Materials, Section S7. From Figure 4 one sees that, in most states, there is a small increase of the IFR in Round 2 and a large one in Round 3. The evolution of the IFR is more clearly seen in Figure 3 . The most likely explanation for this rapid evolution of the IFR is the increased strain on the health system. We note significant statis-  tical tension in the data of Roraima (RR). Therefore, we consider its IFR estimate unreliable, but due to its small population, it has an insignificant impact on the IFR estimates at the country level. The confidence intervals are computed by combining the statistical sources of error and including the non-Gaussian nature of the distributions. Table 2 also shows the prevalence and IFR for the four age bins we consider. We confirm that the IFR increases rapidly with age, in  agreement with previous analyses [8,13,32] . Furthermore, we find that not only the IFR but also the prevalence increases with age, and this pattern is common to all three rounds. Note that, contrary to the IFR case, it does not make sense to combine in Table 2 the prevalence measurements in all rounds.
In Figure 6 we show the Brazilian IFR as a function of T . For T > 80 days, the results converge and the IFR remains unchanged. This is due to the fact that the present analysis addresses the early stages of the pandemic in Brazil (May and June 2020) and there were no deaths before March 2020. Studies using serosurvey data of subsequent months are expected to show a stronger dependence on T , as became clear in the fourth round of EPICOVID19-BR, conducted at the end of August 2020.
We also find that the effect of neglecting the time delay τ ad between the development of antibodies and subsequent fatality has a large impact. It results in an underestimation of the IFR of around 0.19%, roughly a 2.2 standard-deviation shift. For instance, the country-wide IFR for all ages changes from 1.03% (0.88-1.22%) to 0.84% (95% CI: 0.75-0.98%).

Discussion
Our overall estimate of the IFR of 1.03% (95% CI: 0.88-1.22%) is in agreement with some, but not all, of the previous world estimates discussed above. In particular, at the country level, our combined estimate agrees with the one by the Imperial College  , even though at the state level we find several disagreements between their values and our 95% CIs, see Figure 4 . It is worth stressing that our IFR estimate is exclusively based on data and does not rely on models, while the analysis by Mellan et al. [30] employs a sophisticated model that does depend on external data and assumptions. The fact that we find qualitative agreement is an important cross-check of the two methods.
Our estimate features a small 8% standard deviation, including the uncertainty on the IgG fading time T , but it may suffer from the following systematic biases. First, not all COVID-19related deaths may be registered in Painel Coronavírus. One expects this to happen for out-of-hospital fatalities and for it to be more common in the poorest areas with less healthcare infrastructure. Some reports indeed claimed a non-negligible number of undiagnosed respiratory deaths in 2020 [29] . However, because we analyzed the 133 large sentinel cities that entered the EPICOVID19-BR survey, this bias is not expected to be significant. The effect of under-reporting deaths is, nonetheless, an underestimation of the IFR.
Regarding the computation of the IFR as a function of age, a potential bias could arise from the fact that the age distribution of the SIVEP-Gripe dataset may not be representative of the overall population, as some age groups may have a higher tendency to present themselves at the hospital and enter the SIVEP-Gripe dataset. The trend of a rapid increase of the IFR with age agrees well with previous findings [8,13,25,32,33] . This, combined with the fact that prevalence also increases with age, suggests that elderly people are particularly susceptible to COVID-19 in Brazil and urges authorities to promote measures to protect them from SARS-CoV-2 exposure.
Another potential bias comes from the fact that the time in Painel Coronavírus is not the actual time of death but rather the time of notification. In order to alleviate this issue and to average out oscillations due to weekends, as noted above, we smoothed the d n d / d t data according to a forward 20-day moving average which corresponds to shifting on average the time of deaths by 10 days earlier. While this correction should account for most of the delay in notification, one cannot exclude regional and temporal variability, which would affect the estimation of p d and, consequently, of the IFR.
Next, the SIVEP-Gripe dataset is biased towards cases with severe symptoms. Indeed, a significant number of cases are hospitalized when symptoms are notified (see Supplementary Materials, Section S2). We took this into account via a delay parameter, τ = τ sd − τ sivep sd = 2 ± 1 days (see Table 1 ), which models the time that a patient takes to go from the onset of symptoms to severe symptoms (for details, see Supplementary Materials, Section S2).
Had we set τ = 0 , we would have obtained a relative 2% lower estimate.
Finally, the participants of the study may not be fully representative of the whole population of Brazil. The EPICOVID19-BR survey took several measures to mitigate enrollment bias, as discussed by [17] . For instance, for each city, 25 census tracts were selected and, in each one, 10 households were selected at random. In each one, a resident was also selected at random from a listing of members. If the selected individual refused to participate, another resident was selected at random and if they too refused (or if the residents were absent), the team moved on to the neighboring household on the right. As discussed in Supplementary Materials, Section S6, the age and ethnic distribution of the participants follow well the distributions in Brazil, with a small undersampling of those under 20 years old and people of white ethnicity. Nevertheless, since the survey considered only 133 large sentinel cities, the overall IFR we computed is relative to these cities, which amount to 35.5% of the Brazilian population, and one may nevertheless speculate that the IFR could be different in smaller cities and rural or poorer areas.
The IFR of COVID-19 depends not only on the patient's age, as discussed earlier, but also on their health [1] . Our IFR estimate should, therefore, be contextualized to the Brazilian population. To this end, a reasonable proxy for the overall health of a country is life expectancy, and the lower socioeconomic development of Brazil is reflected by a lower life expectancy than, for example, Europe -76.0 years in Brazil, as compared with 80.9 years in Europe, as of 2017.
As shown in Baqui et al. [4] , in Brazil, socioeconomic and structural factors are as important as biological factors in determining the outcome of COVID-19. The following were found to be particularly important factors: the state of residence and its development index, distance to the hospital, level of education, and hospital funding model and strain. Our analysis considered the first phase of the pandemic: the third EPICOVID19-BR round happened between June 21 and 24, 2020, when Brazil suffered about 50,0 0 0 deaths, a relatively small figure compared to the present 10-fold increase. Consequently, we expect that our IFR estimation is less affected by socioeconomic and structural factors, although we do find that the IFR increased substantially in late June 2020. Also, we find no significant correlation between the IFR and life expectancy in the various states, and the choropleth map does not seem to indicate a clear regional trend in the IFR values. It is hoped that the IFR will decrease as new medications and treatment protocols for the disease are discovered and become available and as the vaccination campaign progresses, although the rise of new variants may counteract this progress. Since our data come from the first months of the pandemic, our results therefore also set a baseline for future comparisons of the fight against COVID-19 in Brazil.
We hope that our careful evaluation of the IFR in Brazil will help reinforce, at the federal, state, and municipal levels, the seriousness of the COVID-19 pandemic and the urgency of taking the proper actions to reduce its societal and economic impact.

Funding Source
VM's research is partially supported by the Brazilian research agencies, CNPq and FAPES. MQ's research is partially supported by the Brazilian research agencies, CNPq and FAPERJ. The research agencies above had no involvement in this research or in the preparation of the article.

Ethical Approval
Ethical approval was not required.

Data sharing
All data needed to evaluate the conclusions in the paper are present in the paper or Supplementary Materials. Data from the EPICOVID19-BR survey are available at www.epicovid19brasil.org . Data from Painel Coronavírus are available at covid.saude.gov.br . Data from SIVEP-Gripe are available at opendatasus.saude.gov.br/ dataset/bd-srag-2020 .

Code availability
The Wolfram Mathematica code and the data used for this work are available at www.github.com/mquartin/covid19-ifr-br .

Disclosure of conflicts of interest
The authors report no conflicts of interest relating to the content of this article. The authors alone are responsible for the content and writing of this article and received no financial support.