Estimating the number of undetected COVID-19 cases among travellers from mainland China

Background: As of August 2021, every region of the world has been affected by the COVID-19 pandemic, with more than 196,000,000 cases worldwide. Methods: We analysed COVID-19 cases among travellers from mainland China to different regions and countries, comparing the region- and country-specific rates of detected and confirmed cases per flight volume to estimate the relative sensitivity of surveillance in different regions and countries. Results: Although travel restrictions from Wuhan City and other cities across China may have reduced the absolute number of travellers to and from China, we estimated that up to 70% (95% CI: 54% - 80%) of imported cases could remain undetected relative to the sensitivity of surveillance in Singapore. The percentage of undetected imported cases rises to 75% (95% CI 66% - 82%) when comparing to the surveillance sensitivity in multiple countries. Conclusions: Our analysis shows that a large number of COVID-19 cases remain undetected across the world. These undetected cases potentially resulted in multiple chains of human-to-human transmission outside mainland China.


Amendments from Version 2
We thank the reviewers for flagging that the link to the code pointed to an outdated version.We have updated the DOI of the code and data to the version used for V2 of this article.
Any further responses from the reviewers can be found at the end of the article

Background
As of August 2021, over 196,000,000 cases of COVID-19 have been reported across the world with over 4,000,000 deaths 1 .Several analyses have been undertaken to predict or estimate the risk of exported cases by country on the basis of flight connections between Wuhan City, China or mainland China as a whole and other regions and countries [2][3][4][5][6][7][8] .Salazar et al. 4 , for instance, fit the number of reported cases in high surveillance countries and report that countries in Southeast Asia such as Indonesia and Thailand had reported fewer imported cases than expected despite a high volume of air travel with China.
In this analysis we built on published work 4 to analyse COVID-19 cases reported and confirmed in different countries that were exported from mainland China, comparing the regionand country-specific rates of detected cases per flight volume to estimate the relative sensitivity of surveillance in different countries.We then estimate the number of COVID-19 cases exported from mainland China that have remained undetected worldwide.

Data sources
Air traffic volume.Air travel data for the months of January, February, and March 2016 were obtained from the International Air Travel Association (IATA), with the sum divided by three to get destination-region-(Hong Kong SAR and Macau SAR) and destination-country-specific monthly averages.The data from 2016 were the most recent data to which we had access.These numbers were not scaled up to reflect recent growth in air travel because any constant scaling of the monthly averages would simply be absorbed into the estimates of model parameters (see Analysis) and not affect other results.Flows of passengers within mainland China were excluded from this analysis.

Number of cases detected outside mainland China.
We collated data on 3276 cases in international travelers from media reports and provincial and national department of health press releases up until 27 February 2020 1 9 .Media reports on new cases of COVID-19 were followed daily from 15 th January 2020 to 27 th February 2020.Where possible, the details reported in the news were validated against official sources.Relevant websites such as ministries of health or local news media were identified through web searches.Reports in languages other than English were translated into English using translation services available online (e.g.Google translate).We defined a local transmission as any transmission that occurred outside mainland China (Hong Kong SAR and Macau SAR are considered outside mainland China for this analysis).We only consider cases that were not transmitted locally.That is, we only considered cases detected outside mainland China that had a travel history to China and arrived outside mainland China by air, excluding repatriation flights (Table 1).Everyone we classified as a "case detected overseas" had the mode of travel either explicitly mentioned as air, or implied as the most probable mode of travel from mainland China to the destination (e.g. from China to Italy).Where multiple modes of travel are possible e.g. from mainland China to Hong Kong, we have only classified individuals as cases detected overseas where the mode of travel was explicitly mentioned as air.In most instances, all or most of the passengers on repatriation flights had been tested for the presence of SARS-CoV2.The cases detected through surveillance of repatriation flights are therefore not representative of the general sensitivity of surveillance in a country.We have therefore excluded these from the analysis.
Based upon these inclusion criteria, a total of 173 cases were included in our analysis.The earliest date of travel for the cases included in the analysis is 1 January 2020, and the latest date of travel is 25 February 2020.

Analysis
We assume that the observed number of exported cases in a country i is Poisson distributed with a mean that depends on the air traffic from Wuhan to i, and the sensitivity of surveillance in i relative to a country j, denoted by s ij .For each country i, let X i be the number of exported cases (a count) and let F i be the volume of air traffic from Wuhan to country i.We can then write a joint log likelihood for the data from countries i and j: ignoring additive constants.Thus, the maximum likelihood estimates for λ j and s ij are: The likelihood-based confidence intervals are obtained by calculating the maximum log likelihood (over values of λ j ) for each value of s ij .Then the 95% confidence interval includes all those values of s ij such that 2 ( ŝij ll sij ) ≤ 3.84 (the 95 th centile of the chi-squared distribution with 1 degree of freedom).These calculations were all performed using R version 3.6.0.
The relative sensitivities can also be estimated relative to J countries simultaneously using a method similar to above but with the log likelihood: This has been updated since the analysis presented here was released as a public report by the Imperial College London Coronavirus Response Team on available 22nd February 2020.This report is available at https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-6international-surveillance/.See https://doi.org/10.5281/zenodo.3736643.Expected values can then be calculated for every country i as simply λ J F i , and the expected value for all countries is where N is the total number of countries with air traffic from Wuhan Tianhe International Airport (N = 119).

Results
The observed number of exported cases by country was plotted as a function of the average monthly passenger volume originating from Wuhan Tianhe International Airport on international flights (Figure 1 9 ).This showed Singapore to be an outlier in terms of having relatively many observed exported cases compared to the measure of air traffic volume.
The relative sensitivity of surveillance in individual countries was estimated compared to Singapore.Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada were all found Nepal, Philippines, Sweden, India, Sri Lanka, and Canada, respectively).
A limitation of our study is that we do not take into account the changes in air travel due to the travel advisories and restrictions imposed by various governments (though only those in force before 27 February 2020 would be relevant), which may have changed the volume of passengers flying into particular countries.Further, in using the data from 2016, we assume that the passenger volumes in early 2020 into each country is scaled by a constant factor.Access to more recent data on the changes in the number of passengers would likely improve the estimates of the sensitivity of surveillance presented here.For countries/regions that are connected to Wuhan using multiple modes of transport such as train links and water routes e.g., Hong Kong, surveillance is likely to have been enhanced at ports of entry other than airports.If so, the estimate of the sensitivity of surveillance as estimated here would therefore likely present an underestimate for these regions.
During the period of this study, Wuhan was the epicenter of the outbreak.Hence, it was reasonable to assume that a case detected outside China with travel history to Hubei province in this period is likely to be an imported case.However, epidemiological investigations are critical to ascertain the origin of a case.Timely public release of the results of such investigations could help public health professionals better assess the spread of the disease.
Undoubtedly, the exported cases vary in the severity of their clinical symptoms, making some cases more difficult to detect  The number of exported COVID-19 cases detected by region and country plotted against the average monthly international air traffic volume from Wuhan Tianhe International Airport aggregated by destination country.The colour of the points denotes the continent of the destination country (Asia -orange, Europe -light blue, Africa -green, North America -dark blue, South America -pink, and Oceania -dark orange).
to have relative sensitivity estimates greater than 1 (i.e. more cases were detected per passenger flight than in Singapore).Thus, a second set of relative sensitivity estimates was obtained for all other individual countries compared simultaneously to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada.
The region-and country-specific expected numbers of exported COVID-19 cases were in several cases substantially higher than the numbers detected (Figure 2 9 ).The sum of the expected numbers of exported COVID-19 cases for all regions and countries other than mainland China was 576.8 (95% CI: 372.2 -845.4), based on the analysis relative to Singapore only, and 704.4 (95% CI: 510.3 -942.3),based on the analysis relative to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada.Given that 173 such cases were detected, these central estimates suggest that between 70% (95% CI: 54% -80%, relative to Singapore only) and 75% (95% CI: 66% -82%, relative to Singapore, Finland, Nepal, Philippines, Sweden, India, Sri Lanka, and Canada) remained undetected.

Discussion
Consistent with similar analyses 4,10 , we estimated that more than two thirds of COVID-19 cases exported from Wuhan have remained undetected worldwide, potentially leaving sources of human-to-human transmission unchecked (70%, 95% CI: 54% -80% and 75%, 95% CI: 66% -82%, undetected, based on comparisons to Singapore only and to Singapore, Finland, than others.However, some countries have detected significantly fewer than would have been expected based on the volume of flight passengers arriving from Wuhan City, China.These undetected cases potentially resulted in multiple chains of human-to-human transmission outside mainland China.

Hannah E. Clapham
Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore This is a well-done analysis that estimates the number of unreported cases that were imported from China early on in the pandemic.It does this using data on air travel volume between China and other countries, and the reported numbers of cases in the places that reported the highest numbers of cases given their air traffic volume.This analysis was highly relevant early in the pandemic as infections spread from China.
I have a few comments on points for clarification below.

Abstract:
The statement of the results about 70/75%... in the abstract is confusing.Suggest rephrasing.Conclusion: I wonder if this is a conclusion from the paper.I would suggest the addition here that the analysis leads to estimates that there were many unreported imported infections, and that potentially lead to transmission.

Main text:
Background: It would be helpful to have a statement about what the previous analysis in reference 4 did/showed.Methods: Please add more detail on how airports in China were used in the flow calculation, and also on the definition of a destination region.In the results section initially Wuhan is focused on but the general conclusions seem to be from all of mainland China.Please clarify throughout.Please add more detail from where the collated data on imported cases was obtained.Were all excluded cases excluded because they were defined as local or due to missing information on this?Was data available on which location was travelled from within China for the imported cases?If not, how was this dealt with in the analysis?Results: Figure 2 legend, are the numbers shown relative to Singapore numbers, or is the analysis done relative to Singapore and then the estimates of imported cases shown?At the moment, the legend reads as the former, but my understanding of the analysis is that is it the latter.Please clarify.
Discussion: Please add on limitations of the analysis, in particular how this relates to the available data including classification as imported vs local, and that this data needed to be publicly available.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Infectious disease epidemiology and dynamics
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Overall the methodology is sensible and the findings of interest at the time.

Comments:
The manuscript is missing detail on how the 3276 cases were collated.Was this done systematically and if so how?Were non-English reports collated and was there any attempt to correct for language biases?Have you got references for the cases and, if yes, could they be added, e.g., to the csv file?

○
The index could be more consistent, e.g. it seems \lambda=X_j/F_j should have an index j, and s should have indices i and j (and no e).

○
It would be interesting to see the results here compared to Golding et al., https://www.medrxiv.org/content/10.1101/2020.07.07.20148460v1, which had the same aim but used a different methodology/data -also, to compare to other estimates of underdetection in the relevant countries.Furthermore, only the data processing script for cases in international travelers is provided, not the code for data analysis or visualization of results.
While we understand that the authors cannot share the IATA data, they should at least provide results sufficient to reproduce Figure 2, and preferably all code for analysis and visualization such that anyone with access to the data could reproduce the results.

Minor
There is a disconnect between the use of passenger flow data from only WTIA and analysis of cases thought to have acquired infection in mainland China, regardless of whether they travelled through Wuhan or Hubei; some explanation of the decision to use these inconsistent definitions is warranted.
Please clarify why repatriation flights have been excluded when selecting cases for analysis.
At the beginning of the analysis section, "We assume that the number of exported cases in a country i…" should read "We assume that the observed number of exported cases in a country i…".This distinction should be clarified throughout.Some mention could be made of the reason for using 2016 flight data rather than more recent data.

Reviewer Expertise: infectious disease epidemiology and modelling
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Author Response 12 Aug 2021
Christl Donnelly, Imperial College London, London, UK We apologise for the omission of relevant code.We have now added all relevant code so that someone with relevant data can reproduce the results.We have also included a file with dummy data on travel volume to help the readers.The README file has also been updated to include instructions on running the code. 1.
Everyone we classified as cases detected overseas had travel by air either explicitly mentioned, or implied as the most probable mode of travel from mainland China to the destination (e.g. from China to Italy).Where multiple modes of travel are possible e.g. from mainland China to Hong Kong, we have only classified individuals as cases detected overseas where the mode of travel was explicitly mentioned as air.The text has been updated to clarify this.

2.
In most instances, all passengers on repatriation flights had been tested for the presence of SARS-CoV2.The cases detected through surveillance of repatriation flights are therefore not representative of the typical sensitivity of surveillance in a 3.
country.We have therefore excluded these from the analysis.The text has been updated to clarify this.This sentence has now been edited and the distinction has been emphasised in the rest of the text.

4.
The data from 2016 were the most recent data to which we had access when undertaking our analysis.Further, any constant scaling of the volume of passengers would not affect the estimates of model parameters (lambda and s_e).This has been emphasised in the text.

5.
We have included the limitations of the method and data sources in the discussion.6.
Thanks for highlighting these relevant references.We have now included reference to these in the section Background.

7.
The abstract and the reference were updated as of August 2021.8.
The reference to lambda has been removed from the methods section and a reference added to the appropriate section.

9.
We have added reference to other studies conducted at the time which provide estimate surveillance sensitivity globally.

10.
Competing Interests: No competing interests were disclosed.

Figure 2 .
Figure 2. The expected and observed numbers of exported COVID-19 cases by country, with surveillance sensitivity relative to Singapore only.Values above the diagonal line indicate more cases were expected than were observed.The colour of the points denotes the continent of the destination country (Asia -orange, Europe -light blue, Africa -green, North America -dark blue, South America -pink, and Oceania -dark orange).

Figure 1 .
Figure 1.Exported COVID-19 cases vs average air traffic from Wuhan Tianhe International Airport by destination.The number of exported COVID-19 cases detected by region and country plotted against the average monthly international air traffic volume from Wuhan Tianhe International Airport aggregated by destination country.The colour of the points denotes the continent of the destination country (Asia -orange, Europe -light blue, Africa -green, North America -dark blue, South America -pink, and Oceania -dark orange).

Reviewer Report 27 1 Reviewer
September 2021 https://doi.org/10.21956/wellcomeopenres.18948.r45822© 2021 Clapham H.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Hannah E. Clapham Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore No further comments.Competing Interests: No competing interests were disclosed.Reviewer Expertise: Infectious disease epidemiology and dynamics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Version Report 13 August 2020 https://doi.org/10.21956/wellcomeopenres.17332.r39749© 2020 Clapham H.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Is the work clearly and accurately presented and does it cite the current literature? Partly Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Partly If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
1Golding N, Russell T, Abbott S, Hellewell J, et al.: Reconstructing the global dynamics of underascertained COVID-19 cases and infections.medRxiv.2020.Publisher Full Text No competing interests were disclosed.
○A thorough discussion of potential biases/limitations is missing.○References 1.

Is the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Partly If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Yes
2. Gilbert M, Pullano G, Pinotti F, Valdano E, et al.: Preparedness and vulnerability of African countries against importations of COVID-19: a modelling study.The Lancet.2020; 395 (10227): 871-877 Publisher Full Text Competing Interests: No competing interests were disclosed.