Impact of spatiotemporal heterogeneity in COVID-19 disease surveillance on epidemiological parameters and case growth rates

SARS-CoV-2 case data are primary sources for estimating epidemiological parameters and for modelling the dynamics of outbreaks. Understanding biases within case-based data sources used in epidemiological analyses is important as they can detract from the value of these rich datasets. This raises questions of how variations in surveillance can affect the estimation of epidemiological parameters such as the case growth rates. We use standardised line list data of COVID-19 from Argentina, Brazil, Mexico and Colombia to estimate delay distributions of symptom-onset-to-confirmation, -hospitalisation and -death as well as hospitalisation-to-death at high spatial resolutions and throughout time. Using these estimates, we model the biases introduced by the delay from symptom-onset-to-confirmation on national and state level case growth rates (rt) using an adaptation of the Richardson-Lucy deconvolution algorithm. We find significant heterogeneities in the estimation of delay distributions through time and space with delay difference of up to 19 days between epochs at the state level. Further, we find that by changing the spatial scale, estimates of case growth rate can vary by up to 0.13 d−1. Lastly, we find that states with a high variance and/or mean delay in symptom-onset-to-diagnosis also have the largest difference between the rt estimated from raw and deconvolved case counts at the state level. We highlight the importance of high-resolution case-based data in understanding biases in disease reporting and how these biases can be avoided by adjusting case numbers based on empirical delay distributions. Code and openly accessible data to reproduce analyses presented here are available.


Introduction
Surveillance of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has expanded since it was first reported in November 2019 (Oude Munnink et al., 2021;Zhu et al., 2020). However, disease surveillance remains highly heterogeneous across countries and case definitions have changed significantly as a result of changing testing capacity, improved understanding about transmission during the asymptomatic phase and general human behavioural change in response to the pandemic (Flaxman et al., 2020;Verity et al., 2020;Wu et al., 2020;Ke et al., 2021;Pullano et al., 2021;Parag et al., 2022). Improvements to surveillance efforts can affect key epidemiological distributions by reducing the time delay from exposure to onset of infectiousness to diagnosis (Kraemer et al., 2021). These in turn can directly influence estimation of the time-varying reproduction number (R t ) and growth rate (r t ) (Rong et al., 2020;Pitzer et al., 2021) (Supplementary Table. 1). Estimation of these epidemiological distributions/parameters provides key information on changes in transmission, which contribute to decisions on the implementation of pharmaceutical and non-pharmaceutical interventions (NPIs) (Anderson et al., 2020;Dushoff and Park, 2021;Parag et al., 2021;Pellis et al., 2021).
Initial estimations of SARS-CoV-2 epidemiological distributions/parameters were based on biased data primarily due to limited capacity of testing for SARS-CoV-2 in hospitalised patients (Vandenberg et al., 2021). This contributes to a degree of uncertainty and heterogeneity in the accuracy and precision of these estimates especially when comparing them between countries and across age groups (Cowling et al., 2020;Mellan et al., 2020;Verity et al., 2020;Parag et al., 2022). Since the initial stages of the pandemic, global surveillance and notification systems have significantly improved (Vandenberg et al., 2021) providing a wealth of data which can be used to re-evaluate SARS-CoV-2 epidemiological distributions/parameters. This raises the question of how variations in surveillance affects the estimation of epidemiological distributions/parameters. We aim to understand how spatial and temporal heterogeneities in reporting (specifically delays in reporting) can impact the accuracy of estimates of epidemiological parameters (specifically growth rate r t ) within and between countries. To do this, we are using a rich, standardised, and individual level line list database extracted from Global.health (htt ps://global.health/). We focus on estimating the delays between symptom-onset-to-confirmation, -hospitalisation and -death as well as hospitalisation-to-death.

Data
The Global.health database contains individual case data from over 100 countries (https://global.health/). The database contains a rich array of fields describing demographics, location (up to Administrative Area 3 resolution), and key epidemiological and clinical events for confirmed COVID-19 cases. In relational database format, each row is a single confirmed COVID-19 case, and columns detail attributes for each case (Schema: https://github.com/globaldothealth/list/blob/c0da5 7d6b227ab861ad5e695d711699c02c2721f/data-serving/scripts/ex port-data/data_dictionary.txt). Data is primarily sourced from official country line lists compiled and shared by national health institutions where available, as was the case for all countries in this study (Xu et al., 2020). The detail of the case data varies by country: inter-country variability in COVID-19 data collection and reporting online leads to differences in Global.health data availability, as detailed in Fig. 1. The dataset used in this study was downloaded from Global.health on 31/01/2022. An updated line list can be downloaded from Global.health via the website or by following instructions on the API docs: https://gith ub.com/globaldothealth/list/tree/main/api. We can provide the exact dataset downloaded for this analysis upon written request.
To investigate the spatial heterogeneity of epidemiological parameters inferred from public data, we focus on COVID-19 line lists from four countries in Latin America that have consistently provided comprehensive and detailed line list data since the start of the pandemic in early 2020: Mexico, Brazil, Argentina, and Colombia. For each country, we aggregated data to the state level, then for each state, calculated delay distributions defined in Supplementary Table 1. To investigate trends over time, the line lists for each country are split into three time-periods hereafter called epochs. These epochs represent different stages of the SARS-CoV-2 epidemic in each country. We use the same time periods for each country analysed in this study due to the difficulties in standardising phases of epidemic progression between countries. These difficulties arise from a lack of reliable epidemiological information and case incidence data, but also by existing gaps in our understanding of the mechanisms involved in the transmission dynamics between different social and geographical contexts particularly for newly emerging infectious diseases (Chowell et al., 2016). However, in each country we Fig. 1. The number and proportion of recorded cases with data entries for each epidemiological distribution have been extracted from Global.health line lists for Argentina, Brazil, Colombia, and Mexico. A, B and D represent the delay from symptom-onset-to-diagnosis, -hospitalisation, and -death respectively whilst C represents the delay from hospitalisation-to-death. The blue, red, teal, and yellow solid line represents a 7-day rolling average for the total number of data entries for Argentina, Brazil, Colombia, and Mexico respectively. The blue, red, teal and yellow dashed line represents a 7-day rolling average for the proportion of recorded cases with data entries for Argentina, Brazil, Colombia, and Mexico respectively. The dashed vertical lines represent epoch change times. cover the 1st and 2nd waves of infections as well as a period of low incidence in infections between these two waves: • Epoch 1: 2020-03-03-2020-06-30 (initial COVID-19 wave) • Epoch 2: 2020-07-01-2020-11-30 (receding epidemic and low case counts) • Epoch 3: 2020-12-01-2021-03-31 (second wave/SARS-CoV-2 VOCs) Additional filtering of the data was applied to these time delays to eliminate biases introduced by erroneous entries. We removed all cases which were reported before the first reported case in the countries of interest based on the Ministry of Health's websites (Roberts et al., 2021). Moreover, we removed outliers that fell outside of the 97.5 % range of the data on each of the delay distributions.

Epidemiological distributions
To estimate the epidemiological distribution, a gamma probability density function (PDF) was fitted to onset-to-death and hospitalisationto-death whilst a generalised lognormal (GLN) probability density function (Singh et al., 2012) was fitted to onset to diagnosis and hospitalisation (Table 1). These PDFs were chosen as they were evaluated to best fit COVID-19 line list data (Hawryluk et. al., 2020). The parameters of each distribution are fitted by a joint hierarchical model with partial pooling similar to (Hawryluk et. al., 2020), using state level data (Administrative Area 1 resolution) from Argentina, Brazil, Colombia, and Mexico.
Posterior samples of the parameters are generated using Hamiltonian Monte Carlo (HMC) (Hoffman and Gelman, 2014) in Stan (Carpenter et al., 2017) using PyStan (v.2.19.0.0: https://mc-stan.org/users/inter faces/pystan). Four chains with 2000 iterations, with 50 % of the iterations dedicated to burn-in, were used for each fit. For all fitted densities, the mean and variance parameters were constrained to be positive.

Correlation analysis
Spearman's rank-order correlation coefficient (r s ) was calculated for delays between symptom-onset-to-confirmation, -hospitalisation and -death as well as hospitalisation-to-death for each state, using the scipy. stats 'spearmanr' function (scipy version 1.7.3). P-values were provided by this function, which indicates the probability of an uncorrelated system producing data with a correlation value at least as extreme as the one observed.
We also explored if there was a correlation between the population density of each state and the mean onset-to-diagnosis averaged across all epochs by using a spearman's rank test. We used parametric bootstrapping (n = 1000) to test statistical significance, defined using approximate unbiased p-values less than 0.05.

Deconvolution
We used deconvolution to adjust for delays in the development of detectable viral loads, symptom onset, and reporting (Gostic et al., 2020). Deconvolution allows us to reconstruct the unlagged incidence time series given a known delay distribution (estimated above). Here, we adapted the method by Goldstein et al. (Goldstein et al., 2009). This method uses the daily confirmed incidence curve (I t ) and the symptom onset to confirmation probability distribution (d 1,., d I ) to calculate the expected number of cases (μ t ) to occur at time t adjusting for delays. We assume that the daily incidence curve (I t ) is Poisson distributed. The model requires non-negativity constraints on the parameters λ t , which represents estimates of mean infection incidence, reflecting the fact that they are Poisson means.
The model ran for 50 iterations or until the normalised x 2 statistic (Eq. 2) comparing the observed and expected number of cases per day falls below 1. Here, N represents the length of our study period, E is the expected number of cases on day i and D is the probability of observation on day i. We calculated the deconvolved case counts at both the national and state level for each epoch.

Growth rate
To estimate the daily growth rate (r t ) by country and state we adapted the approach from Pellis et al. (Pellis et al., 2021). In short, the growth of daily case numbers of lagged and unlagged SARS-CoV-2 cases (y) at time (t) was considered exponential. To estimate r t , a quasi-Poisson family generalised linear model (GLM) with a log link was applied. We used a quasi-Poisson distribution opposed to the standard negative-binomial distribution within our main results (Figs. 5 and 6) as a negative-binomial distribution tends to give a higher weighting to smaller case counts than a quasi-Poisson distribution (ver Hoef and Boveng, 2007). As our data predominately covers periods of high case counts, we would prefer our adjustments to be dominated by those with higher case counts.
To allow growth rates to vary over time in a semi-parametric manner, a generalised additive model (GAM) was used where y(t) ∝ e s (t) for some smoother s(t). As such, r t is the time derivative of the smoother r t = s(t). We started calculating the growth rate once the cumulative number of daily cases reached over 100 on the national level and over 20 on the state level to ensure that the exponential growth phase was captured.

Number of data entries / Global.health case counts
Disease reporting varied by country and field. Fig. 1 shows the number and proportion of recorded cases with data entries (Supplementary Table 2) from the Global.health linelist from which we can infer the delays between onset-to-confirmation (A), onset-to-hospitalisation (B), hospitalisation-to-death (C) and onset-to-death (D). There are significant heterogeneities between countries and overtime between the number of cases recorded and a data entry being present for a specific Table 1 Probability density functions with analytical formulae for mean and variance. y denotes the data, v () is a gamma function. GLN, generalised log-normal. PDF Mean Variance delay. For example, almost all cases in Mexico are populated with the delay between onset-to-confirmation. In contrast, while almost all initial cases in Argentina were populated with the delay between onset-toconfirmation, over time, the proportion of cases with data entries fell consistently to around 55 %. Further, there is a large variability in completeness of the fields that allow estimation of symptom-onset-todiagnosis ranging between 36 % and 97 % in Brazil. Data was not available for symptom-onset-to-hospitalisation for Colombia and hospitalisation-to-death for both Argentina and Colombia. They were therefore not included in subsequent analyses.

Estimation of delay distribution and growth rate
We estimate the delay distributions (Supplementary table 1), reconstruct deconvolved case numbers and r t for local SARS-CoV-2 epidemics in Argentina, Brazil, Colombia, and Mexico.

Delay distributions
PDFs were applied to epidemiological data from Argentina, Brazil, Colombia, and Mexico to estimate the delay from symptom onset-todiagnosis, delay from symptom onset-to-hospitalisation, delay from hospitalisation-to-death, and the delay from symptom onset-to-death at the state level. Posterior plots of state-level results (Figs. 2, 3 and Supplementary Figures 2 and 3) show the shape (the range and pattern) and spread (the variance) for the delay for all delay distributions between states and over time.

Brazil
In Brazil, we observe substantial heterogeneities in the mean delay across all four distributions between states and for the epochs. For example, for all states, the mean delay from symptom-onset-to-diagnosis increases from 7.24 days in epoch 1-10.46 days in epoch 2, declining to 5.55 days in epoch 3 (Supplementary Table 2). At the state level, Distrito Federal had the 3rd overall lowest mean delay of 4.08 days whilst Paraná had the highest mean delay of 22.74 days (Fig. 2, Supplementary  Table 3). Interestingly, this trend was reversed for the distribution of hospitalisation-to-death with Distrito Federal having the highest mean delay of 13.89 days and Paraná having the 3rd lowest mean delay of 10.01 days (Fig. 2, Supplementary Table 3). Additionally, states with a large delay from symptom-onset-to-diagnosis also had a large delay from symptom-onset-to-hospitalisation (r s = 0.58, p < 0.01). Conversely, we found states with a large delay from symptom-onset-to-diagnosis had a shorter delay from hospitalisation-to-death (r s = 0.60, p < 0.01) (Supplementary Figure 1). Moreover, we found that the longer the delay from symptom-onset-to-hospitalisation the shorter the delay from hospitalisation-to-death (r s = − 0.37, p < 0.01) (Supplementary Figure 1) implying the longer it takes to be hospitalised after becoming symptomatic the shorter the time in hospital before death.

Mexico
Similar to Brazil, we found heterogeneities across states and time for all delay distributions within Mexico (Fig. 3). Moreover, the trends for each distribution overtime are similar to Brazil with the mean delay from symptom-onset-to-diagnosis decreasing overtime from 3.08 in epoch 1 and 2.62 in epoch 3 (Supplementary Table 2). However, there is substantially less variability in the delay from symptom-onset-to diagnosis and from hospitalisation-to-death ( Fig. 3 A and C). This can be seen by the mean difference in delay from symptom-onset-to diagnosis and from hospitalisation-to-death between the highest state (Nayarit) and lowest state (Chihuahua) differing only by 2.33 days and 3.76 days respectively over all epochs (Supplementary table 3). Further, like Brazil, we also found that increases in the mean delay from symptomonset-to-diagnosis was negatively correlated with symptom-onset-to- Fig. 2. Delay distributions are estimated from daily case counts on the state level for three distinct epochs for Brazil. A, B and D represent the delay from symptomonset-to-diagnosis, -hospitalisation, and -death respectively whilst C represents the delay from hospitalisation-to-death. Orange represents epoch 1, purple represents epoch 2 and blue represents epoch three. All plots are ordered from the smallest to largest by the epoch with the smallest mean delay. death (r s = − 0.38, p = 0.03) and positively correlated with symptomonset-to-hospitalisation (r s = 0.65, p < 0.01) (Supplementary Figure 1).

Argentina
In contrast to both Brazil and Mexico, epoch 1 in Argentina had the lowest delay from symptom-onset-to-diagnosis and the highest delay for the symptom-onset-to-death (Supplementary Figure 2). We found that there was a high inter-state variance, as seen by the elongated shape on the violin plot. For the 11 states where data was available for the delay from symptom-onset-to-hospitalisation, the mean delay increased from 2.46 days in epoch 1-4.64 days in epoch 3 whilst the mean delay between symptom-onset-to-death decreased from 16.98 days in epoch 1-15.54 days in epoch 3 (Supplementary table 2). We did not find a significant relationship between delay distributions but note that no data was available for hospitalisation-to-death (Supplementary Figure 1).

Colombia
Like Argentina, we find that for Colombia epoch 1 had the lowest delay from symptom-onset-to-diagnosis (Supplementary Figure 3A). We find that the overall mean delay between symptom-onset-to-diagnosis is substantially longer for epoch 3 (10.83 days) than for epoch 1 (1.96 days) (Supplementary table 2). This large increase in the overall mean delay is driven by three states; Norte de Santander, Guainía, and Santa, which have mean delay from symptom-onset-to-diagnosis of over 30 days for epoch 3 (Supplementary Figure 3A, Supplementary Table 3). There is no overall trend across symptom-onset-to-death ( Figure 5B).

Relationship between state population density and delay from symptom-onset-to-diagnosis
For each country, we calculated the correlation between the mean delay from symptom-onset-to-diagnosis across epochs and the median population density at the state level (logged). Overall, we found a very weak correlation between the mean symptom-onset-to-diagnosis across epochs and population density at the state level. Our Spearman correlation coefficients were: Mexico 0.09, Colombia 0.06, Brazil 0.09, Argentina 0.39. Overall, we see no statistically significant correlations (p > 0.05) between symptom-onset-to-diagnosis delay and median state population density across the four countries (Supplementary Figure 4).

Deconvolution of case time series
We apply methods from Goldstein et al. to raw SARS-CoV-2 case counts (date of confirmation) in the four countries studied to obtain the deconvolved daily case counts. Fig. 4 shows the deconvolved incidences curves. Notability, we find a marked delay in cases for Colombia in epoch 3 particularly after the 1st of February 2021. Further, we find that the initial peak in cases within Brazil had significant delays perhaps due to high case incidence.

Growth rates
We applied the Pellis et al. model to estimate r t from raw case data and deconvolved case data for each of our countries of interest (Fig. 5) using a quasi-Poisson family generalised linear model (GLM). The same calculation was done using a negative binomial family and the same trends were observed (Supplementary figure 5). Based on the deconvolved case counts, initially, for all countries the mean r t was above zero, indicating a growing epidemic. For all countries the mean r t declined moving into the second epoch. Argentina experienced a mean r t falling Fig. 3. Delay distributions are estimated from daily case counts on the state level for three distinct epochs for Mexico. A, B and D represent the delay from symptomonset-to-diagnosis, -hospitalisation, and -death respectively whilst C represents the delay from hospitalisation-to-death. Orange represents epoch 1, purple represents epoch 2 and blue represents epoch three. All plots are ordered from the smallest to largest by the epoch with the smallest mean delay.  consistently below zero during epoch 2. Towards the end of epoch 2, the mean r t increased above zero and remained above zero at the start of epoch 3 for all countries.
Generally, it appears that the r t estimated from the raw case counts lags behind the r t estimated from the deconvolved case counts, which is expected. However, this difference is not significant, and all 95 % confidence intervals (CIs) are overlapping (Fig. 5). At the start of the study period there is an increase in uncertainty for the deconvolved case counts represented by the wider CIs and in general higher r t in all countries using raw case data.
Next, we evaluated r t on a state level by selecting states with the lowest mean delay (Fig. 6A, B, E and G) and highest mean delay (Fig. 6B, D, F and H) of symptom-onset-to-confirmation. We compared r t estimates from state and national deconvolved case counts in addition to raw case counts. When the delay from symptom-onset-to-confirmation is low, there is a mismatch between the r t calculated using national level deconvolved case counts and the r t calculated using raw case and state level deconvolved case counts. For example, in La Pampa, Argentina (Fig. 6E), mean r t is initially below 0 ( − 0.03 d − 1 ) when using national level deconvolved case counts and above 0 when using raw ( 0.1 d − 1 ) and state level deconvolved case counts ( 0.07 d − 1 ). Conversely, when the delay from symptom-onset-to-confirmation is high, there is a mismatch between the r t calculated using state level deconvolved case counts and the r t calculated using raw case and national level deconvolved case counts. This can be seen in Roraima state, Brazil (Fig. 6B), where there are fluctuations of r t below and above 0 when r t is calculated using state level deconvolved case counts when compared to r t estimations from raw and national level deconvolved case counts where r t =~0 indicating epidemic stabilisation has occurred.

Discussion
In this study, we fitted multiple probability density functions to a number of epidemiological datasets to quantify the delay from symptom-onset-to-hospitalisation and hospitalisation-to-death, from the Global.health database (https://global.health/), using Bayesian hierarchical models. Subsequently, the national level and state level delay from symptom-onset-to-confirmation was used to deconvolve raw case counts and we measure the impact on case growth rates r t.
We found that across all countries investigated (Argentina, Brazil, Colombia, and Mexico) there were strong geographical heterogeneities between states for our inferred delays (Supplementary Table 2 and 3) with the delays from symptom-onset-to-diagnosis and symptom-onsetto-death being most accentuated. Whilst studies exploring testing heterogeneities in Latin America are limited, in the early stages of the epidemic, frequent and free testing was not available and testing was largely reserved for patients within hospitals and symptomatic individuals (Asahi et al., 2021;Gaudart et al., 2021;Vandenberg et al., 2021). Less urbanised states, such as Roraima state, Brazil, Michoacán state, Mexico, and Boyacá, Colombia within the countries analysed had the largest delay in symptom-onset-to-diagnosis. It has been shown in other settings that access to symptomatic testing varied spatially due to geographic accessibility (Jaitman, 2015) and length of travel to healthcare facilities (Syed et al., 2013;Kelly et al., 2016;Rader et al., 2020).
In addition to spatial heterogeneities, strong temporal heterogeneities were observed. For Brazil and Mexico, the delay in symptom-onsetto-diagnosis decreased over time by 23 % and 15 % respectively whilst for Argentina and Colombia this delay increased over time by 18 % and 452 % respectively. Brazil and Mexico experienced a more rapid epidemic progression with the first wave of cases peaking at the end of the first epoch ( Fig. 4B and D). In contrast, Colombia and Argentina had Fig. 6. r t estimated from both raw, national and states level deconvolved case counts for states with the highest mean delay in symptom-onset-to-diagnosis (A, C,6E and G) and the lowest mean delay in symptom-onset-to-diagnosis (B, D, F and H) for Argentina, Brazil, Colombia, and Mexico. The light-shaded area represents the 95 % Confidence Interval with the darker-shaded area presenting where the two estimations overlap. The solid line represents the mean r t estimate with r t estimated from raw case counts in red, state level deconvolved case counts in orange and national level case counts in blue. The vertical dashed lines represent epoch change times. a slower epidemic progression with their first wave of cases peaking in the second epoch ( Fig. 4A and C). This is also reflected in the number of data entries with Brazil and Mexico having over double the number of entries in epoch 1 than Argentina and Colombia (Fig. 1). With limited testing resources available (Asahi et al., 2021;Gaudart et al., 2021;Vandenberg et al., 2021), it is plausible that public health departments in Brazil and Mexico struggled to test all symptomatic cases in a timely manner when compared to Argentina and Colombia which had fewer cases during that period.
By using deconvolution to infer the unlagged time series of infections, we can improve the accuracy of key epidemiological parameters (Gostic et al., 2020). In particular, by using the delay distribution of symptom-onset-to-confirmation we allow r t to be estimated closer to real time (some have called this 'nowcasting' (McGough et al., 2020)). We found that in states with a small delay from symptom-onset-to-diagnosis there was a mismatch between r t estimated using national level deconvolved case counts and raw or state level deconvolved case counts. Further, in states with a large delay from symptom-onset-to-diagnosis there was a mismatch between r t estimated using state level deconvolved case counts or raw and national level deconvolved case counts. This is significant as using deconvolved case counts at a less granular spatial scale can significantly affect the interpretation of the epidemic picture. For example, for Roraima state, Brazil (Fig. 6B) using national level deconvolved case counts to estimate r t we would predict that epidemic stabilisation has occurred even though cases have changed significantly throughout time (https://github.com/CSSEGISandData /COVID-19). As such, deconvolution is a valuable method, even within local epidemiological contexts with low case counts or areas with low population, in improving our understanding of state level epidemiological dynamics.
While our results provide a rigorous underpinning and insight into delay distributions and impact of these on epidemiological parameters estimation, we acknowledge several limitations. The Global.health database which contains line lists that our distributions have been estimated from, though extensive, contains typing errors, and the degree to which these bias our estimates are unknown. Our data ingestion pipeline is mostly automated and only occasionally are we able to manually verify the accuracy of the data. Further, when comparing line list data between and within countries we note disparities in notification systems and differences in case definitions. Further work should evaluate the demographic biases in these data and how that may affect transmission dynamics (longer delays for less severe cases in younger age groups may impact transmission substantially). Lastly, there is a low testing rate for the countries analysed (Hasell et al., 2020) and heterogeneities in testing rates in both time and space (Vandenberg et al., 2021) which can influence the results for both cases and r t . Future epidemiological work is needed to compare parameters estimated from case data, death data and excess death data across different settings (Gostic et al., 2020) and more intensive monitoring and/or the use of alternative data sources such as genomic data (Inward et al., 2022) is needed to improve the reliability of estimations.
Few countries report highly detailed epidemiological data limiting the ability to perform robust analyses on the impact of delays on transmission across the world. One primary concern for limited sharing of these data is privacy. Our work demonstrates the ability to perform scalable analyses of delay distributions and their impact on case growth rates and could be applied across all settings and through time. In the future, raw data may not need to be shared publicly: algorithms could locally process line list data stored in each country, with only aggregated statistics shared globally.
This work has highlighted the impact that both spatial and temporal heterogeneities can have on delay distributions and subsequent estimations of the case growth rate. Whilst more epidemiological datasets from a variety of countries and regions with different sampling intensities are needed to create a more generalisable understanding and to identify predictors of these differences, we have shown that accounting for delays on both a national and state level can introduce substantial differences in the estimation of epidemiological parameters. This finding identifies the need for more targeted attempts at performing epidemiological surveillance and epidemic analyses particularly in resourcepoor settings which have limited surveillance systems.

Role of the Funding Sources
M.U.G.K. is supported by The Branco Weiss Fellowship -Society in Science, administered by the ETH Zurich and acknowledges funding from a Google Faculty Award, the Oxford Martin School. This work was partially funded by the European Union Horizon 2020 project MOOD (#874850), Google.org, and the Rockefeller Foundation. The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.

CRediT authorship contribution statement
M.U.G.K, R.P.D.I and F.J conceived and designed the study, R.P.D.I and F.J performed the analyses. G.L., A.L.B., A.D. and F.J. assisted with data curation, ingestion, and processing. R.P.D.I and F.J wrote the manuscript which was edited and supervised by M.U.G.K. All authors have contributed to and approved the manuscript for submission.

Conflicts of interest
The authors declare no conflicts of interest.

Data availability
Data will be made available on request.