Tracking the international spread of SARS-CoV-2 lineages B.1.1.7 and B.1.351/501Y-V2 with grinch

Late in 2020, two genetically-distinct clusters of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with mutations of biological concern were reported, one in the United Kingdom and one in South Africa. Using a combination of data from routine surveillance, genomic sequencing and international travel we track the international dispersal of lineages B.1.1.7 and B.1.351 (variant 501Y-V2). We account for potential biases in genomic surveillance efforts by including passenger volumes from location of where the lineage was first reported, London and South Africa respectively. Using the software tool grinch (global report investigating novel coronavirus haplotypes), we track the international spread of lineages of concern with automated daily reports, Further, we have built a custom tracking website (cov-lineages.org/global_report.html) which hosts this daily report and will continue to include novel SARS-CoV-2 lineages of concern as they are detected.


Introduction
In December 2020, routine genomic surveillance in the United Kingdom (UK) 1 reported a new and genetically distinct phylogenetic cluster of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (variant VOC202012/01, lineage B.1.1.7).Preliminary analysis suggests that this lineage carries an unusually large number of genetic changes 2 .The earliest known cases of B.1.1.7 were sampled in southern England in late September 2020, and by December the lineage had spread to most UK regions and was growing rapidly 3 .In October 2020, a separate SARS-CoV-2 cluster (variant 501Y.V2, lineage B.1.351),which carried a different constellation of genetic changes, was detected by the Network for Genomic Surveillance in South Africa 4,5 .Both lineages carry mutations, especially in the virus spike protein, that may affect virus function, and both appear to have grown rapidly in relative frequency since their discovery.Early analyses of the spatial spread of SARS-CoV-2 highlights the potential for rapid virus dissemination through national and international travel 6,7 .Therefore continued genomic monitoring of lineages of concern is required.
To facilitate tracking of these lineages on an international scale, we developed a software tool grinch (global report investigating novel coronavirus haplotypes) that collates SARS-CoV-2 genomic data and epidemiological metadata.Resources such as grinch on cov-lineages.orgcan inform public health bodies and institutions around the world.Other excellent resources to track lineages and variants are available, including covariants.org, which tracks the spread of SARS-CoV-2 variants of interest, and outbreak.info,which gathers multiple sources of genetic and epidemiological data to track lineages.We include a non-exhaustive list of resources for tracking SARS-CoV-2 at https://cov-lineages.org/resources.html.

Methods
To better characterise the international distribution of lineages B.1.1.7 and B.1.351we sourced SARS-CoV-2 sequences from GISAID 8,9 and assigned lineages using pangolin (v2.1.6,https://github.com/cov-lineages/pangolin),which implements the nomenclature scheme described in Rambaut et al., 10 .Genomes are assigned lineage B.1.1.7 if they exhibit at least 5 of the 17 mutations inferred to have arisen on the phylogenetic branch immediately ancestral to the cluster (Table 1) 2 ; or to B.1.351if they exhibit at least 5 of 9 lineage-associated mutations (Table 1) 5 .Lineage count and frequency data have been calculated daily using grinch.Using International Air Transport Association (IATA) travel data from October 2020, available through bluedot.global, we aggregated and collated the passenger volumes from international airports in London and South Africa to international destinations on same booking.Destinations with more than 5,000 passengers from London and more than 300 passengers from South Africa during the month of October are displayed on the cov-lineages.orgwebsite and in the underlying data for this publication 11 .grinch, with custom python modules that make use of geopandas v0.9, matplotlib v3.2 and seaborn v0.10, combines this information and produces reports with descriptive tables and figures that can be found at https://cov-lineages.org/global_report.html.

Implementation
All of the code underlying this daily lineage tracking web-report can be found at GitHub and Zenodo 12 .grinch is a python-based tool, the analysis pipeline of which is built on a snakemake backbone 13 .Every 24 hours a scheduled cron 14 task runs on our local servers.We download the latest data from GISAID and deduplicate based on sequence names.The sequences are assigned their most likely lineage using pangolin's latest version and model files.All processed metadata is available and maintained on the cov-lineages.orgGitHub repository.To run grinch, the user must have access to a GISAID direct download key and a password and provide these within a configuration file for use.The command used to run grinch is grinch -i grinch_config.yaml, using the config file provided at doi:10.5281/zenodo.4640379 15.
Operation Most users will not run grinch themselves, instead all information and useful descriptive figures are provided daily on the web report.Users can navigate to cov-lineages.org in a web browser of choice to view the latest daily report.

Results and discussion
As of 7th Jan 2021, 45 countries had reported the presence of B.1.1.7 and 13 countries had reported B.1.351/501Y.V2.B.1.1.7 and B.1.351genome sequences were available for 28 and 8 countries, respectively (Figure 1a, b, c) 11 .Although some countries report increases in the relative frequency of B.1.1.7,genome sequencing efforts vary considerably.Potential targeting of sequencing towards travelers from the UK could bias

Amendments from Version 1
We have updated the figures to amend some issues with proofing.We have added in some details of other excellent resources for SARS-CoV-2 international surveillance.Over on covlineages.org(which has had a facelift since time of publishing), we have also added in a resources page (https://cov-lineages.org/resources.html) that points the user to both internally developed and externally developed resources for SARS-CoV-2 lineage and variant tracking.Figure 1 and Figure 2 along with title were also updated.
Any further responses from the reviewers can be found at the end of the article Reported refers to countries that we found media reports stating there had been sequences of that particular lineage, but for which there were no sequences on GISAID.This is distinct from 'not reported' where there were no records found of that lineage in a given country.e) Map of international flights from major international London airports to countries with B. frequency estimates upwards (Figure 1b, c) and differing genome sharing policies and delays may also skew reporting estimates.
The time between the initial collection date of a new variant sample in a country and the first availability of a corresponding virus genome on GISAID was, on average, 12 days (range 1-71).
The number of B.1.1.7 and B.1.351/501Y.V2 genome sequences reported in each country is a consequence of (i) the intensity of local genomic surveillance; (ii) the level of concern about new variant introductions; (iii) the volume of international travel among affected countries, and (iv) the amount of local transmission following the introduction of lineage from elsewhere.To explore these factors, we analysed the most recent available IATA travel data (October 2020).We collated the total number of origin-to-destination air journeys between major London international airports and each country.The calculation was repeated for journeys originating in all international South African airports.We focussed on London and South Africa as they are the locations with the first reports and highest reported prevalence of lineagesB.1.1.7 and B.1.351respectively 2,5 .However, due to low SARS-CoV-2 genomic surveillance in many locations, we cannot reject the hypotheses that these lineages initially originated elsewhere.Figure 1d shows destinations receiving >5,000 travellers in October 2020 from the UK (Figure 2 shows destinations receiving >300 travellers from South Africa).
Of the countries that receive >5,000 travellers from London, 16 have sequenced B.  surveillance (Denmark, UK, Iceland, The Netherlands, Australia, Sweden), 3 have prioritised sequencing based on S-gene target failure tests 16 , 30 primarily targeted sequencing towards arriving travellers from the UK, and there was no information available for 10 (details at https://github.com/cov-lineages/lineages-website/blob/master/_data/).Of the 13 countries that have identified B.1.351(four with local onward transmission including South Africa), 4 perform routine sequencing (South Africa, UK, Botswana, Australia), 6 target sequencing of travellers, and there was no information available for 3. Consequently, there is no clear relationship between number of sequences reported and flight numbers, but rather reflects the current genomic surveillance effort.For example, in September, the UK sequenced ~13% of its reported cases and Denmark sequenced ~21%.In comparison, Israel sequenced ~0.002% of its cases during the same period 17,18 .
Our study has several limitations.The passenger flight data do not include recent changes to holiday travel, and recent restrictions on travel from the UK and South Africa is not reflected in the mobility data.Further, flight data may not accurately reflect the final destination if multiple tickets are purchased.
The discovery and rapid spread of B.

Some minor comments below:
The transition from an article of public health importance to a software tool is abrupt.I think a paragraph or a link aimed at orientating the audience would be useful.

1.
It would be useful to outline the special niche that the tool occupies or the gaps it fills relative to similar utilities and webpages such as covariants.organd outbreak.info.

2.
The readme file at https://github.com/cov-lineages/grinchlacks full installation documentation.An introductory paragraph of the tool and its utility would also be useful.
The scripts directory could be better organised by separating the snakemake files from the regular python files.I would image a workflows dir and scripts dir 3.

Is the rationale for developing the new software tool clearly explained? Yes
Is the description of the software tool technically sound?Yes

Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: bioinformatics, molecular epidemiology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
About the online reports, increasing the font size in the plots being displayed (bar, curves, etc) would make labels and legends more intelligible, and improving the readability of their content.Reviewer Expertise: Virology, Bioinformatics, Evolution, Epidemiology.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
We have made the axes longer to account for more countries, however are working towards re-implementing these reports in javascript so they can be interactive and more responsive to the browser size.
About the flight data, why only flight counts from October are shown?Are these data only used for tracking the potential spread in early stages of viral emergence, or do you see other uses for such data?October related to the date of early spread of both lineages described in the text and was due to limitations of access to data.We hope to continue to develop this resource and supply more recent dates that track over time.

Concerning the manuscript, a few minor points:
The colour gradient in the legend of Figure 1 is incomplete and does not go from 1 to 76.I think it must be just a formatting issue.This was a proofs issue and we've hopefully rectified this now.

Rob Lanfear
Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia This article describes a software tool, grinch, that can be used to produce automated reports on SARS-CoV-2 lineages.The authors apply it to two lineages of concern in the article, and also highlight that the main utility of grinch is not in static one-off reports, but in regularly updated reports available at https://cov-lineages.org/global_report.html.
The paper clearly describes the software and demonstrates its utility.I'd like to commend the authors for putting this tool and the associated website together so quickly, for maintaining both to a very high standard, for making sure that all of the work is open and reproducible, and for the huge amount of work and enormous collaborative effort that has gone into this clear and concise report.
I have no serious reservations about the software tool or the data, analyses, or conclusions presented in the manuscript.The software is clear, open-source, sufficiently documented, and almost all of the proposed utility is presented on a clear and regularly updated website.The manuscript is clearly written, well researched, concise, and the conclusions are well justified by the analyses.
Of course, I do have a few comments, some of which I hope might be useful in improving the paper and/or the website.
Minor comments on the manuscript: I felt there was some tension in this article about whether it's a software note or a public health report.The title suggests the latter, but much of the article (and the article type of "Software Tool Article") suggests the former.Most of this tension for me as a reader came from looking at the title, which has no mention of software, so I think sets up expectations that differ from what is then provided (quite reasonably) in the paper.A very simple way to address this would be to start the title with "Using grinch to track…" or to end it with "… using grinch".

1.
Similar to point 1, the abstract doesn't actually mention 'grinch' or https://covlineages.org/global_report.html.It would seem clearer to me to incorporate in the abstract the framing that this article presents a generally applicable software tool, demonstrated on two lineages of concern.

2.
I would like to see some mention of related efforts somewhere in the report.A full detailed comparison is neither warranted nor useful here because all such websites can and should change regularly, but a couple of sentences comparing cov-lineages.orgto sites like outbreak.infoand covariants.orgwould be very useful.At a minimum, it seems useful to list the similar sites the authors are aware of, if only because the fact one can see similar patterns presented on those sites serves as a useful validation of the software presented in this paper.

3.
Given the situation, this is a desirable, not a requirement, but I'd love to see some unit tests on the GitHub repo.It seems potentially important to have this when the intention is to 4.
produce daily updates for public health.(Though I note that getting the same end result from completely independent implementations on other sites is probably worth more than a lot of unit tests).
I struggled with Figure 1D.It wasn't clear to me what 'reported' and 'not reported' mean.
And the legend makes it really hard to figure out how colours map to counts.

5.
It's stated that there is no correlation between the numbers of sequences and flight numbers.It would be nice to see the scatter plot for this (maybe as an inset to figure 1D?), as well as the effect size and p-value of a suitable model.

6.
Following from the previous point, the explanation for the lack of a correlation with absolute numbers seems reasonable.But it still seems to me that numbers could correlate with the frequency of B117 at a fixed time interval from the first detected case in a given locality (thus somewhat factoring out sequencing effort in the locality).Is it possible to add this analysis? 7.
Please add installation instructions to the GitHub repo 8.
Minor comments on https://cov-lineages.org/global_report.html: Figure 3 for each lineage is a map of sequence counts by region.I find the legend here completely baffling.All it states is grey=No variant (that makes sense), pink = 1 sequence (that makes sense too), and purple = 'Max sequences'.I have no idea what to make of this.How many is 'Max', and how am I supposed to quantitatively interpret intermediate colours to pink and purple?It's so obvious I'm certain there are good reasons why this isn't already done, but it does seem like a continuous colour scale is what should be used here.Similar to the scale in Figure 2 (grey for no data, shades of green nicely spaced and annotated for different values of a continuous variable). 1.
For the widespread lineages like B.1.1.7,there's a lot of overplotting on Figures 4 and 5, which make the counts and the country names very difficult to read.This could be addressed by just making the figures larger.

2.
The table of links to news reports is absolutely wonderful.Would it be possible to include a button here to allow users to suggest additional news links?(I assume there's an existing mechanism for doing this, but I couldn't find one, so if not maybe just a link to a github issue with (potentially) a pre-filled title and required information would help?)

3.
Really minor comments about the manuscript: The first use of IATA (first para of the methods) is missing "International", i.e. it says "Using Air Transport…". 1.
The second use of IATA (second para of results) does not need to be spelled out.2.
Figure 1A seems like it is missing a second Y axis for the number of GISAID genomes reported.

3.
In the PDF version and the HTML version it seems that new lines were added wherever 4.

Is the rationale for developing the new software tool clearly explained? Yes
Is the description of the software tool technically sound?Yes

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Yes Competing Interests: I am a paid consultant to GISAID, the database on which much of the data analysed in this article is hosted.
Reviewer Expertise: Phylogenetics, molecular evolution, bioinformatics.I have a passing familiarity with SARS-CoV-2 data analysis.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
health report but this didn't fit with the Wellcome Open Research journal remit, so resubmitted under Software Tool.I have changed the title as suggested to with with "using grinch".
Similar to point 1, the abstract doesn't actually mention 'grinch' or https://covlineages.org/global_report.html.It would seem clearer to me to incorporate in the abstract the framing that this article presents a generally applicable software tool, demonstrated on two lineages of concern.
Our abstract and introduction now both contain reference to grinch and the reports at covlineages.orgI would like to see some mention of related efforts somewhere in the report.A full detailed comparison is neither warranted nor useful here because all such websites can and should change regularly, but a couple of sentences comparing cov-lineages.orgto sites like outbreak.infoand covariants.orgwould be very useful.At a minimum, it seems useful to list the similar sites the authors are aware of, if only because the fact one can see similar patterns presented on those sites serves as a useful validation of the software presented in this paper.
We have added in a short paragraph about these resources and a link to a more extensive (but definitely non-exhaustive) list of resources.
Given the situation, this is a desirable, not a requirement, but I'd love to see some unit tests on the GitHub repo.It seems potentially important to have this when the intention is to produce daily updates for public health.(Though I note that getting the same end result from completely independent implementations on other sites is probably worth more than a lot of unit tests).
Since publication, we've re-worked the back-end analysis pipeline and the GISAID data is now processed with the datapipe pipeline written by Rachel Colquhoun (https://github.com/COG-UK/datapipe).The reports and webpages are still generated with grinch, however the main data processing steps are now done with the robust datapipe pipeline.
It's stated that there is no correlation between the numbers of sequences and flight numbers.It would be nice to see the scatter plot for this (maybe as an inset to figure 1D?), as well as the effect size and p-value of a suitable model.Following from the previous point, the explanation for the lack of a correlation with absolute numbers seems reasonable.But it still seems to me that flight numbers could correlate with the frequency of B117 at a fixed time interval from the first detected case in a given locality (thus somewhat factoring out sequencing effort in the locality).Is it possible to add this analysis?
We have amended to state there is no clear relationship, rather than correlation.

Please add installation instructions to the GitHub repo.
We have added updated usage and install instructions, and a description of the behaviour to the repository, at https://github.com/cov-lineages/grinch/blob/main/README.md

Resources;
Tumedi KA: Data Curation, Formal Analysis, Resources; Nyepetsi G: Data Curation, Resources; Kebabonye M: Resources; Matsheka M: Resources, Supervision; Mine M: Resources, Supervision; Tokajian S: Data Curation, Formal Analysis, Resources; Hassan H : Data Curation, Formal Analysis, Resources; Salloum T: Data Curation, Formal Analysis, Resources; Merhi G: Data Curation, Formal Analysis, Resources; Koweyes J: Data Curation, Formal Analysis, Resources; Geoghegan JL: Data Curation, Formal Analysis, Resources; de Ligt J: Data Curation, Formal Analysis, Resources; Ren X: Data Curation, Formal Analysis, Resources; Storey M: Data Curation, Formal Analysis, Resources; Freed NE: Data Curation, Formal Analysis, Resources; Pattabiraman C: Data Curation, Formal Analysis, Resources; Prasad P: Data Curation, Formal Analysis, Resources; Desai AS: Data Curation, Formal Analysis, Resources; Vasanthapuram R: Data Curation, Formal Analysis, Resources; Schulz TF: Data Curation, Formal Analysis, Resources; Steinbrück L: Data Curation, Formal Analysis, Resources; Stadler T: Data Curation, Formal Analysis, Resources, Supervision; Parisi A: Data Curation, Formal Analysis, Resources; Bianco A: Data Curation, Formal Analysis, Resources; García de Viedma D: Data Curation, Formal Analysis, Resources; Buenestado-Serrano S: Data Curation, Formal Analysis, Resources; Borges V: Data Curation, Formal Analysis, Resources; Isidro J: Data Curation, Formal Analysis, Resources; Duarte S: Data Curation, Formal Analysis, Resources; Gomes JP: Data Curation, Formal Analysis, Resources; Zuckerman NS: Data Curation, Formal Analysis, Resources; Mandelboim M: Data Curation, Formal Analysis, Resources; Mor O: Data Curation, Resources; Seemann T: Data Curation, Formal Analysis, Resources; Arnott A: Data Curation, Formal Analysis, Resources; Draper J: Data Curation, Formal Analysis, Resources; Gall M: Data Curation, Formal Analysis, Resources; Rawlinson W: Data Curation, Formal Analysis, Resources; Deveson I: Data Curation, Formal Analysis, Resources; Schlebusch S: Data Curation, Formal Analysis, Resources; McMahon J: Data Curation, Resources; Leong L: Data Curation, Formal Analysis, Resources; Lim CK: Data Curation, Formal Analysis, Resources; Chironna M: Data Curation, Formal Analysis, Resources; Loconsole D: Data Curation, Formal Analysis, Resources; Bal A: Data Curation, Formal Analysis, Resources; Josset L: Data Curation, Formal Analysis, Resources; Holmes E: Investigation, Writing -Original Draft Preparation, Writing -Review & Editing; St. George K: Data Curation, Formal Analysis, Resources; Lasek-Nesselquist E: Data Curation, Formal Analysis, Resources; Sikkema RS: Data Curation, Formal Analysis, Resources; Oude Munnink B: Data Curation, Formal Analysis, Resources; Koopmans M: Data Curation, Formal Analysis, Resources; Brytting M: Data Curation, Formal Analysis, Resources; Sudha rani V: Data Curation, Resources; Pavani S: Data Curation, Resources; Smura T: Data Curation, Formal Analysis, Resources; Heim A: Data Curation, Formal Analysis, Resources; Kurkela S: Data Curation, Formal Analysis, Resources; Umair M: Data Curation, Formal Analysis, Resources; Salman M: Data Curation, Formal Analysis, Resources; Bartolini B: Data Curation, Formal Analysis, Resources; Rueca M: Data Curation, Formal Analysis, Resources; Drosten C: Data Curation, Formal Analysis, Resources; Wolff T: Data Curation, Formal Analysis, Resources; Silander O: Data Curation, Formal Analysis, Resources; Eggink D: Data Curation, Resources; Reusken C: Data Curation, Resources; Vennema H: Data Curation, Formal Analysis, Resources; Park A: Data Curation, Formal Analysis, Resources; Carrington C: Data Curation, Formal Analysis, Resources; Sahadeo N: Data Curation, Formal Analysis, Resources; Carr M: Data Curation, Formal Analysis, Resources; Gonzalez G: Data Curation, Formal Analysis, Resources; de Oliveira T: Data Curation, Formal Analysis, Resources; Faria N: Data Curation, Formal Analysis, Investigation, Resources; Rambaut A: Conceptualization, Formal Analysis, Funding Acquisition, Supervision, Validation, Writing -Original Draft Preparation, Writing -Review & Editing; Kraemer MUG: Conceptualization, Data Curation, Formal Analysis, Investigation, Project Administration, Resources, Supervision, Visualization, Writing -Original Draft Preparation, Writing -Review & Editing Competing interests: No competing interests were disclosed.Grant information: I.I.B. is supported by the Canadian Institutes of Health Research, COVID-19 Rapid Research Funding Opportunity (02179-000).K.K. is the founder of BlueDot, a social enterprise that develops digital technologies for public health.K.K., A.W., A.T.B. and C.H. are employed at BlueDot.I.I.B. has consulted for BlueDot.T.d.O. and the NGS-SA is funded by the South African Medical Research Council (SAMRC), MRC SHIP and the Department of Science and Innovation (DSI) of South Africa.N.R.F.acknowledges support from a Wellcome Trust and Royal Society Sir Henry Dale Fellowship (204311/Z/16/Z) and a Medical Research Council-São Paulo Research Foundation CADDE partnership award (MR/S0195/1 and FAPESP 18/14389-0).VH was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) [grant number BB/M010996/1].M.U.G.K. acknowledges support from the Branco Weiss Fellowship and EU grant 874850 MOOD.The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.O.G.P. , J.P.M. and M.U.G.K. acknowledge support from the Oxford Martin School.AR acknowledges the support of the Wellcome Trust (Collaborators Award 206298/Z/

Figure 1 .
Figure 1.a) The cumulative number of countries with reports of lineage B.1.1.7 (grey line) and cumulative number of genomes of B.1.1.7 deposited in GISAID.b) Rolling seven-day average of the proportion of B.1.1.7 genomes in countries with more than ten sequences of the variant, and with more than ten days between the first B.1.1.7 sequence and the most recent one compared to all sampled genomes in that country.c) Number of sequences (log10) per country.Colour indicates the proportion of sequences that are classified as lineage B.1.1.7.d) Number of air travellers from major international London airports (Heathrow, Gatwick, Luton, City, Stansted, Southend) during October 2020.Colour indicates the number of sampled genomes of lineage B.1.1.7.Reported refers to countries that we found media reports stating there had been sequences of that particular lineage, but for which there were no sequences on GISAID.This is distinct from 'not reported' where there were no records found of that lineage in a given country.e) Map of international flights from major international London airports to countries with B.1.1.7 sequences.Colours indicate the date of earliest detection of B.1.1.7. in each country.The width of the lines indicates the number of flights.International Air Transport Association data used here account for ~90% of passenger travel itineraries on commercial flights, excluding transportation via unscheduled charter flights (the remainder is modelled using market intelligence).Data shown represents origin-destination journeys during October 2020.Routes to countries that have not yet detected B.1.1.7 and deposited data on GISAID are not included.
Figure 1.a) The cumulative number of countries with reports of lineage B.1.1.7 (grey line) and cumulative number of genomes of B.1.1.7 deposited in GISAID.b) Rolling seven-day average of the proportion of B.1.1.7 genomes in countries with more than ten sequences of the variant, and with more than ten days between the first B.1.1.7 sequence and the most recent one compared to all sampled genomes in that country.c) Number of sequences (log10) per country.Colour indicates the proportion of sequences that are classified as lineage B.1.1.7.d) Number of air travellers from major international London airports (Heathrow, Gatwick, Luton, City, Stansted, Southend) during October 2020.Colour indicates the number of sampled genomes of lineage B.1.1.7.Reported refers to countries that we found media reports stating there had been sequences of that particular lineage, but for which there were no sequences on GISAID.This is distinct from 'not reported' where there were no records found of that lineage in a given country.e) Map of international flights from major international London airports to countries with B.1.1.7 sequences.Colours indicate the date of earliest detection of B.1.1.7. in each country.The width of the lines indicates the number of flights.International Air Transport Association data used here account for ~90% of passenger travel itineraries on commercial flights, excluding transportation via unscheduled charter flights (the remainder is modelled using market intelligence).Data shown represents origin-destination journeys during October 2020.Routes to countries that have not yet detected B.1.1.7 and deposited data on GISAID are not included.

Figure 2 .
Figure 2. a) Shows the cumulative number of countries with reports of lineage B.1.351(black line) and cumulative number of genomes of B.1.351deposited in GISAID.b) Rolling seven-day average of the proportion of B.1.351genomes in countries with more than ten sequences of the variant, and with more than ten days between the first B.1.351sequence and the most recent one compared to all sampled genomes in that country.c) Number of sequences (log10) per country.Colour indicates the proportion of sequences that are classified as lineage B.1.351d) Number of air travellers from South Africa during October 2020.Colour indicates the number of sampled genomes of lineage B.1.351.Not reported refers to a given country having no record of B.1.351,and reported refers to countries that we found media reports but that country had no SARS-CoV-2 genomes shared on GISAID at that time.e) Map of international flights to countries with B.1.351sequences.Colours indicate the date of earliest detection of B.1.351 in each country.The width of the lines indicates the number of flights.International Air Transport Association data used here account for ~90% of passenger travel itineraries on commercial flights, excluding transportation via unscheduled charter flights (the remainder is modelled using market intelligence).Data shown represents origin-destination journeys during October 2020.Routes to countries that have not yet detected B.1.351and deposited data on GISAID are not included.>300 travellers from South Africa).
and Demography Department, Kenya Medical Research Institute (KEMRI) -Wellcome Trust Research Programme, Kilifi, Kenya The article by O'Toole 2021 et al. describes a bioinformatics tool for the analysis of SARS-CoV-2 sequence data.The article is concise, and the relevant details have been considered.For example, the software and source code is available and well documented.The tool has shown great utility in public health based on its application in tracking and describing two SARS-CoV-2 variants of global concern.

○○
About the flight data, why only flight counts from October are shown?Are these data only used for tracking the potential spread in early stages of viral emergence, or do you see other uses for such data?○Concerning the manuscript, a few minor points:The colour gradient in the legend of Figure1is incomplete and does not go from 1 to 76.I think it must be just a formatting issue.○ How was the "reported" cases shown in Figures 1 and 2 detected?By differential PCR?I know that applies to B.1.1.7,but what about B.1.351?○ The legend in Figure 2 refers to "B.1.1.7"sequences, while the figure shows "B.1.351"sequences.It must be a typo.Is the rationale for developing the new software tool clearly explained?Yes Is the description of the software tool technically sound?Partly Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?Partly Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?Partly Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Yes Competing Interests: No competing interests were disclosed.

How to cite this article: O
17/Z -ARTIC network) and the European Research Council (grant agreement no.725422 -ReservoirDOCS).A.OT is supported by the Wellcome Trust Hosts, Pathogens & Global Health Programme [grant number: grant.203783/Z/16/Z]and Fast Grants [award number: 2236].COG-UK is supported by funding from the Medical Research Council (MRC) part of UK Research & Innovation (UKRI), the National Institute of Health Research (NIHR) and Genome Research Limited, operating as the Wellcome Sanger Institute.TFS acknowledges support from the Deutsche Forschungsgemeinschaft (SFB900, EXC2155 RESIST).SeqCOVID-SPAIN is supported by a grant from the Instituto de Salud Carlos III COV0020/00140.'Toole Á, Hill V, Pybus OG et al.

Open Peer Review Current Peer Review Status: Version 1
https://doi.org/10.21956/wellcomeopenres.18372.r43967© 2021 Githinji G.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.