Context-specific emergence and growth of the SARS-CoV-2 Delta variant

Summary The Delta variant of concern of SARS-CoV-2 has spread globally causing large outbreaks and resurgences of COVID-19 cases1-3. The emergence of Delta in the UK occurred on the background of a heterogeneous landscape of immunity and relaxation of non-pharmaceutical interventions4,5. Here we analyse 52,992 Delta genomes from England in combination with 93,649 global genomes to reconstruct the emergence of Delta, and quantify its introduction to and regional dissemination across England, in the context of changing travel and social restrictions. Through analysis of human movement, contact tracing, and virus genomic data, we find that the focus of geographic expansion of Delta shifted from India to a more global pattern in early May 2021. In England, Delta lineages were introduced >1,000 times and spread nationally as non-pharmaceutical interventions were relaxed. We find that hotel quarantine for travellers from India reduced onward transmission from importations; however the transmission chains that later dominated the Delta wave in England had been already seeded before restrictions were introduced. In England, increasing inter-regional travel drove Delta's nationwide dissemination, with some cities receiving >2,000 observable lineage introductions from other regions. Subsequently, increased levels of local population mixing, not the number of importations, was associated with faster relative growth of Delta. Among US states, we find that regions that previously experienced large waves also had faster Delta growth rates, and a model including interactions between immunity and human behaviour could accurately predict the rise of Delta there. Delta’s invasion dynamics depended on fine scale spatial heterogeneity in immunity and contact patterns and our findings will inform optimal spatial interventions to reduce transmission of current and future VOCs such as Omicron.

examine virus genomes generated from random samples collected during community-based COVID-19 testing in England, between March 12 and June 15, 2021. Our data include 52,992 Delta VOC genomes from England with known dates and locations of sampling, representing >40% of all positive tests in England during the study period (lateral flow and PCR tests; see Methods and details in 29 ). Using these data we evaluate the effectiveness of policies in reducing international importations and how they contributed to the establishment and local transmission dynamics of Delta in England. We then investigate, at a high spatial resolution, how immunity and human mobility contributed to context-specific growth of Delta in England and the United States.

Reconstruction of international importation and national spread of Delta in England
To provide global context for the emergence of Delta in the UK, we first conducted a phylodynamic analysis of 975 Delta SARS-CoV-2 genome sequences sampled evenly by collection date between March 4, 2021 and June 15, 2021. Details of the origin and spread of Delta within India are still uncertain but coincided with a substantial increase in genomic surveillance across the country which will likely facilitate the study of these important events, but is outside the scope of this work. However, to put the UK epidemic into context we estimate the time of the most common recent ancestor (TMRCA) of Delta globally to be October 19, 2020 (95% highest posterior density [HPD] interval: 2020-09-06 -2020- [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29]. The frequency of Delta in India does not appear to increase substantially until March 2021 (Fig. 1), coinciding with a rapid expansion in case numbers there (Extended Data Fig. 1) and a decline in the relative frequency of genomes assigned to B.1.617.1 (Kappa, a sibling lineage of Delta) 1 . Genomic surveillance in India revealed that several sub-lineages of Delta existed prior to its expansion in March (Fig. 1a,b) 5 . This standing diversity is consistent with undetected transmission of Delta in India between late 2020 and March 2021.
We evaluated the global dissemination of Delta from March 2021 by multiplying, for each country, estimated numbers of SARS-CoV-2 cases, relative frequencies of Delta, and relative numbers of outward international passengers (Estimated Exportation Intensity, EEI, see Methods). The EEI of Delta climbed rapidly during March and was highest around late April, coinciding with peak incidence in India (Extended Data Fig. 2). Subsequent rapid growth of Delta in the USA, Russia, UK and Mexico, and its decline in India resulted in the former locations becoming the main exporters of Delta by June 2021 (Extended Data Fig. 2), corroborating global trends in Delta phylogeography (Fig 1a) and reported cases (Fig 1b). Similar patterns of rapidly changing foci of international dissemination were observed for the initial wave of SARS-CoV-2 in 2020 30,31 .
To evaluate the temporal dynamics of Delta importation into England and to reconstruct its subsequent local spread, we conducted a travel history-aware Bayesian phylogeographic analysis 32 of 93,649 Delta sequences, from GISAID and COG-UK, which accounts in part for the phylogenetic uncertainty inherent in SARS-CoV-2 phylogenies 31 . To render the analysis tractable we split the full tree into three independent subtrees (Fig. 1a) prior to phylogeographic analysis. Virus genomes were generated from ~40-60% of all positive cases in England during the emergence of Delta between March and May 2021 ( Fig. 1c) 33 , providing a unique opportunity to characterize the virus' spread at a high spatio-temporal resolution 33 . We estimate a minimum of 1,458 (95% HPD: 1398-1513) separate international introductions of Delta into England, with approximately half inferred to have originated from India (posterior mean 56.5% ; 95% HPD 53.7%-59.1%). We find the majority of English Delta genomes can be traced back to introductions that are inferred to have occurred prior to the implementation of a mandatory hotel quarantine for people arriving from India (posterior mean 84.3%; 95% HPD: 77.8-90.4%). During this period 90.0% of introductions are inferred to have originated from India (95% HPD: 86.5-93.1%). These inferred importation dynamics closely follow individual-level travel histories from infected incoming international passengers (Fig. 2b).
High variation in sampling intensity among countries means the true number of importations into England is likely much larger than that inferred from phylogeographic analysis alone (Fig. 1b & c, see related discussion in the context of the UK's first wave 31 ). For example, the AY.4 lineage (Fig 1a) comprises 42,445 sequences and was likely imported to England many times. We investigated AY.4 by pairing genomic data with contact tracing data collated by Public Health England. During the study period we found 61 AY.4 sequenced cases had a travel history from India and 140 had a travel history from elsewhere; similar to the time-varying importation dynamics seen across the entire dataset ( Fig. 2a; Extended Data Figure 3). Hence sampling heterogeneity means that the number of importations estimated from phylogenetic analysis represents a lower bound on the true number 31 .
To investigate the importation of Delta into England specifically, and to cross-validate the results above using independent data, we estimate the Estimated Importation Intensity (EII) of Delta to England through time 31,34 . The EII is a metric of Delta importation that represents trends in the number of Delta cases arriving in the country, irrespective of whether or not those cases result in local transmission. Contrastingly, the phylogenetic analysis above better captures trends in the number of Delta introductions that did lead to forward transmission in England. The EII combines (i) weekly reported cases, (ii) weekly prevalence of Delta genomes, and (iii) weekly aggregate human mobility (inferred from mobile phones) into England via direct connections ( Fig. 2a; see 31,34 for related approaches). The EII from India increased rapidly in April 2021 following the rise in cases in India and remained high until the end of May 2021. However, we observe that the correlation between the EII and the numbers of importations inferred from phylogenetic analysis declined significantly after the implementation of hotel quarantine for travellers from India (Fig. 2c, mean R 2,before = 0.95 and R 2,after = 0.15), indicating that this intervention reduced the number transmission chains established locally per infected incoming traveller. From late May importations from countries other than India dominated Delta importation into England (Fig. 2a), a trend also visible in contact tracing data carried out by Public Health England (Fig. 2b, R 2 between non-India importations and EII = 0.95, Fig. 2c). Even though we observe that the implementation of hotel quarantine was effective in reducing onward transmission, substantial importation had already occurred before its implementation and additional introductions from other countries likely further accelerated the spread of Delta in England from May onwards.
There are several reasons why some importations led to onward transmission within England after the implementation of hotel quarantine for arriving travellers: (i) a separate terminal for arrivals from mandatory quarantine countries was not opened at the UK's largest airport (London Heathrow) until 1st June 35 , so arriving passengers may have mixed with others prior to initiating mandatory quarantine; (ii) individuals may have become infectious and transmitted only after leaving quarantine, either due to an unusually long latent period or within-group transmission during the quarantine period, although we do not consider this probable 36 ; (iii) individuals may infect others on a connecting flight where the connecting airport did not require hotel quarantine; (iv) there were exemptions to hotel quarantine that may have led to onward transmission in the community 36 .

Transmission lineage dynamics, dissemination, and establishment of Delta in England
Importations of Delta occurred on a background of relaxation of social distancing in England: on April 12th outdoor dining and non-essential retail reopened, and on May 17th restrictions on indoor dining and international travel were relaxed 37 . The relative frequency of Delta genomes in England increased rapidly during May and reported COVID-19 cases subsequently increased 38 (Fig. 1c). Initially, Delta transmission clusters were concentrated in the North West of England and were commonly associated with returning travellers 17,39,40 . We sought to reconstruct the internal dispersal dynamics of independently-imported Delta transmission lineages in England, in the context of changing non-pharmaceutical interventions.
We analysed all identified Delta transmission lineages in England using continuous phylogeography, thereby inferring their history of dissemination among subnational regions (UTLAs; upper tier local authorities). We observe high heterogeneity among ULTAs in the numbers of Delta introductions from other English regions (Fig. 3a), with Lancashire and Greater Manchester each receiving >2000 estimated independent introductions and Torbay only 9. The majority (n = 11,960) of Delta sequences in England belong to a single transmission lineage (lineage I, Fig. 3d), which was sampled mostly in Greater Manchester and Lancashire, and we observe many short-range lineage movements among UTLAs in these areas (Fig. 3a). Greater London also received many Delta cases from elsewhere in England (Fig.  3a), as expected, given its population size and connectedness to other metropolitan areas 34 . Transmission lineages II and III each comprise 3000-4000 genomes; the former is distributed across multiple urban areas (especially in the North West) whilst the latter is focussed in Greater London and the South East (Fig. 3d). We also highlight transmission lineage V (Fig. 3d), originally centered in Bedfordshire, the location of one of the first Delta outbreaks in England and was subjected to surge testing 41 (Extended Data Fig. 5).
In early May, the number of virus lineage movements among locations accelerated (Fig. 3b, Extended Data Fig. 6), showing that growth in Delta frequency (Fig. 1c) was associated with regional dissemination. This spread occurred on the background of relaxing NPIs and increased mixing (between mid-January and June 2021, mobility in England increased from 20% to 70% of its pre-pandemic level and estimated mean daily contacts rose from ~2 to ~5, 42 ). In contrast, the initial wave of SARS-CoV-2 introductions to the UK, in spring 2020, occurred during a period of increasing travel and social restrictions 31 . In general we find that, as NPIs were progressively relaxed through time, long-range viral lineage movements comprised an increasing proportion of all movements (Fig. 3c).
For the seven largest Delta transmission lineages in England (I-VII) we observed ~3 times more exports from Greater Manchester than from Greater London. This difference matches early epidemiological data: the largest and earliest Delta outbreaks were located in the North West (on May 21 Bolton had 452 cases per 100,000 whilst Greater London had 21.6 43 , see Methods). Introductions of Delta into other, smaller urban areas also spread rapidly (e.g. transmission lineage V, Fig. 3d) and were important for the propagation of the variant across England. We observe spatial structure of the seven largest lineages where the frequency of viral movements decline by distance away from the origin location but we also observe a second peak at ~260km (similar to the distance between Greater London and Greater Manchester, Extended Data Fig. 7). Although North West England was a focus of early Delta transmission, the Delta epidemic in England derived from many successful independent international importations. Each of the main Delta transmission lineages in England grew at a similar rate (Extended Data Fig. 4). In contrast, the Alpha variant (Pango lineage B.1.1.7) expanded across the UK from a single origin in South East England 34 . The spatial expansion of Delta transmission lineages plateaued after early June, when most UTLAs had established Delta transmission and the relative frequency of Delta genomes in England had exceeded 90% 44 . Although Scotland, Wales or Northern Ireland are not included here, case count data suggests that cities in England 45 were the main source of the expanding Delta epidemic in the UK; due to this source-sink structure we do not anticipate that omitting these countries substantially affects our reconstruction of epidemic dynamics in England (of the Delta genomes available before 15th June 2021, 57,592 were from England, 9738 from Scotland, 1067 from Wales and 325 from Northern Ireland). Investigating the factors contributing to accelerated growth of Delta Regional and international heterogeneity in incidence, vaccination, and human mobility have been shown to determine the dynamics of infectious diseases 46 , including those of SARS-CoV-2 31,47-52 . We use a combination of epidemiological, aggregate human mobility, and genomic data to test the hypothesis that (i) relaxation of NPIs impacted Delta local growth rates in England, and (ii) immunity from infection and vaccination affected Delta growth in the US. To do so we develop a hierarchical Bayesian model to estimate the impact of these factors on the weekly relative growth of Delta (i.e., the weekly change in the observed proportion of Delta genomes on a log odds scale) 53 at the UTLA level for England and the state level for the US. Models for estimating the increase in transmissibility of new variants are typically based on increases in relative frequency 3,16,53-55 but rarely take into account other potential confounding factors, such as population immunity 56 .
In general, growth rates varied widely across locations and weeks in England (Fig. 4). This variation may be explained in some cases by specific events, such as the beginning of university holidays in May and June 2021 (e.g. Oxfordshire, Fig. 4a, b). Our model estimates that the most important tested predictor of the variation in growth of Delta (relative to Alpha) across UTLAs in England was within-UTLA mixing (i.e., relative changes in weekly within-UTLA human mobility compared to the pre-pandemic period, Figure 4a, Table S4). The importance of this factor is unsurprising, as preemptive restrictions on movement and social mixing slow the emergence of new pathogens or variants 57 (see counterfactual scenarios in Extended Data Fig. 9); the cost/benefit ratio of such restrictions will of course depend on the specific context of variant emergence. The relaxation of NPIs therefore increased both within-and among-region transmission (see Fig. 3c). Other European countries did not observe such a rapid increase in Delta relative frequency during May 2021 1 ; possible reasons for this difference are (i) during that time levels of mobility and mixing (both local and regional) were lower in those countries and/or (ii) those countries potentially received fewer international importations of Delta (86,489 passengers flew from India to the UK between March and June, whilst 43,515 flew to Germany, and 16,688 to France, during the same time). Vaccination rates did not explain local variation in Delta growth rates in England, possibly because there was insufficient heterogeneity in vaccination rates among UTLAs to detect any effect 58 .
Among US states, levels of immunity through infection and vaccination (as measured by fraction of people with two vaccine doses) varied considerably (from 15.2% at baseline, 14th March 2021, to 56.4% at week ending 26th June 2021 59 ). Using our model while accounting for local mixing patterns, we find that higher baseline local immunity levels were associated with higher overall growth of Delta relative to other lineages (Fig. 4c, d, Table S4). This observation is superficially counter-intuitive but has several possible explanations: (i) due to social and demographic variation, pathogens can exhibit different R0 values in different locations, hence locations with high levels of previous exposure are more likely to support faster transmission of a newly introduced VOCs (provided that sufficient numbers of local susceptibles remain); (ii) Delta is better able to evade neutralising antibodies than other co-circulating variants, specifically Alpha 5,60 . Whilst this hypothesis cannot be excluded, it cannot explain the replacement of Beta by Delta in South Africa 61 and Delta's success is better explained by its increased intrinsic transmissibility than by its ability to evade immunity 5,10,62-64 ; (iii) aggregating data to the US state level may obscure inference of epidemiological dynamics, which may vary substantially at local scales due to variation in vaccination or behaviour 65,66 . In a sensitivity analysis (Table S5) we consider only immunity from prior exposure 67 (not vaccination) and find similar trends. The magnitude of the effect of prior immunity and human mobility can be seen in counterfactual scenarios in Extended Data Figs. 9 and 12.
Using model comparison and out-of-sample prediction (withholding data from the final few weeks), we find that models that included predictors such as baseline immunity and vaccination (US) and within-UTLA mobility (England) fit the observed trajectory of Delta relative growth better than a model without covariates (Methods and validation, Supplementary Tables 6 & 7). We refrained from translating estimates of the growth rate of Delta relative frequency into differences in the reproduction number, as this is sensitive to assumptions about the generation time of the variant, which is also influenced by NPIs and immunity 68 . At the time of analysis, there was no consensus on the generation time of Delta. Further studies should consider estimating the generation times of VOCs in specific contexts of immunity, NPIs and household structure to accurately translate relative growth rates into Rt 69 .

Discussion, limitations and future work
We find that growing epidemics of SARS-CoV-2 Delta worldwide led to a wave of importations of the VOC into England, initially from India, and later from other countries. These importations found fertile ground as they arrived in a context of easing social restrictions, and consequently expanded rapidly across England. Much transmission occurred in unvaccinated and younger populations 70 , and high levels of Delta transmission within the UK led to onward dissemination of the variant to other countries (e.g. 71 ). By pairing the phylogenetic results with contact tracing data we conclude that hotel quarantine measures were effective in reducing onward transmission of imported Delta cases in England. However, after May 21, we found that levels of local social mixing in England, not the number of importations, was associated with faster relative growth of Delta. At that point the independently introduced transmission lineages grew at a similar pace; details of their geographic distribution and expansion will support future work defining the optimal spatial interventions to reduce transmission of VOCs in England.
Undetected genetic diversity and uneven sampling of Delta in India make precise estimation of the number of importations to England difficult from genetic data alone 72 (Extended Data Fig. 8). However, our phylogenetic estimates strongly correlate with estimates derived from independent data on case incidence, Delta prevalence, and arriving travellers (EII, Methods, Fig. 2c) during the period before quarantine policies were announced. Fortunately, additional contact tracing data from public health agencies allowed us to overcome limitations inherent in the unevenly sampled global virus genomic data set, and provide additional confidence in our findings.
Our statistical analysis shows that higher Delta growth rates were positively associated with levels of population immunity and vaccination in the United States and with levels of local mixing in England. In the future, the existence or magnitude of NPIs needed to reduce the healthcare burden of Delta (or future VOCs) to sustainable levels will depend on local levels of population immunity (through vaccination and prior infection). Future work should focus on which factors are most conducive to spread in particular contexts (e.g., high vs. low NPI regimes and across levels of population immunity) so that responses can be planned accordingly. This requires a better characterisation of the distribution and variation of infectiousness through time, and an understanding of virus generation time in different behavioural contexts 73 , for example amongst individuals who are vaccinated, unvaccinated and/or had previous exposure to SARS-CoV-2 (including with which lineage). To do so effectively will require investments in large-scale and coordinated serological studies 74 especially for VOCs with ability to evade immunity.
Even though global reporting of case numbers, virus genomic surveillance, sampling strategies and mobile phone penetration differ across the world, our estimates can still provide qualitative insights into the trends in the source locations and rates of international importation. Including estimates of likely importations in disease surveillance programmes may help support public health decision making 75 and further improvements on these estimates can be achieved when global health surveillance systems are more integrated, and investments in data generation and capacity are linked directly into paired genomicepidemiological analytical pipelines 76 .
The detail with which we document the spatial invasion process of Delta in England provides an opportunity to re-examine how more spatially targeted interventions can support COVID-19 control in the future. Globally coordinated data and analytical pipelines that capture heterogeneity in virus circulation, immunity and policy responses will be necessary to produce the insights necessary to curb the spread of emerging infectious diseases and new variants. However they can only be successful when integrated into a public health framework that can respond and rapidly adapt to public health threats during their emergence 4

Genomic data
International (non-UK) sequences were downloaded from GISAID on September 15, 2021 and combined with English sequences taken as part of community surveillance (pillar 2) available in COG-UK as of September 2021. Sequences were processed and aligned as part of the daily datapipe analysis managed by CLIMB on behalf of COG-UK. Duplicate and environmental sequences as well as those with impossible or incomplete collection dates were removed. All sequences were aligned to the reference Wuhan-Hu-1 (genbank accession MN908947.3) with minimap2 and samples with less than 93% coverage were discarded. Scorpio (https://github.com/cov-lineages/scorpio) was run as part of Pangolin 78 , and sequences containing the Delta VOC constellation of mutations were kept for further analysis.
Problematic sites in the resulting alignment were masked prior to phylogenetic inference and isolates with known sequence artifacts removed (see https://github.com/COG-UK/Delta-analysis for details). Additionally, mutations in the Delta VOC have caused widespread amplicon drop out of amplicon 72 in the commonly-used ARTIC primer scheme (https://www.protocols.io/view/ncov-2019-sequencing-protocol-v3-locost-bh42j8ye) before the introduction of version 4 of the primer scheme. To avoid spurious phylogenetic associations based on differential treatment of amplicon dropout with COG-UK and across the globe, we masked sites 2142-21990 which represent the region solely covered by amplicon 72 and are not overlapped by neighboring amplicons.

Phylogenetic analyses
To provide an overview of the global expansion of Delta (Fig. 1a), we analysed a subset of 1,000 Delta genomes sampled evenly through time. To minimize the effect of incorrectly reported collection dates, we restricted our analysis to samples where the lag between sample collection date and GISAID submission date is less than four weeks. To further ensure only the highest quality samples were included, we built an maximum likelihood tree using iqtree2 79 , rooted with Wuhan-Hu-1 (genbank accession MN908947.3) as an outgroup, and used Treetime 80 to remove tips lying beyond two interquartile ranges from the regression of time against root-to-tip distance. This analysis resulted in a final dataset of 975 samples. The temporal tree estimated by treetime was used as a starting tree in the following Bayesian analysis with slight modifications to randomly resolve polytomies. Two chains of 100 million states were run using BEAST v1.10.4 81 with sampling every 20,000 states. Both chains were combined with the first 10 million states removed for burnin. We used a HKY+ substitution model 82 , a flexible Skygrid coalescent prior 83 with grid points every two weeks 80 , and an asymmetric, discrete phylogeographic model with samples assigned to Indian, English, and Global locales. Preliminary analysis showed very little temporal signal in the data, which is unsurprising given the relatively slow evolutionary rate of SARS-CoV-2 and the short study period. Therefore, in all analyses the evolutionary rate was fixed to 7.5x10-4 substitutions / site as estimated in 31 . Convergence was assessed using Tracer v1.7 84 .
The goal of our phylogenetic analysis was to accurately and efficiently describe importation dynamics into England, without sacrificing the dense sampling needed to reconstruct internal spread at a high resolution. Due to the large size of the required dataset, we followed a similar phylogenetic approach to that used in 31 . First, an approximately maximum likelihood phylogeny was built using a JC69 substitution model in FastTree 85 , and rooted on Wuhan-Hu-1 (genbank accession MN908947.3), a high quality Pango lineage B sample from 2019-12-26, as an outgroup. Internal branches representing less than one substitution were collapsed to polytomies. This tree was then split into three subtrees of roughly equal size (Fig. 1a) (28,783, 28,715, and 36,151 tips). As above, Treetime 80 was then used to remove temporal outliers, generate a starting time tree, and estimate the number of mutations along each branch. For subtree an empirical distribution of time trees was estimated independently using a recently implemented model in BEAST v1.10 81 (commit:d1a45) which replaces the substitution model in classical analyses. Briefly, in this approach the likelihood of the number of mutations along each branch was calculated from a Poisson distribution with mean equal to the evolutionary rate multiplied by the length of the branch in time 86 . In this approach, the standard topological tree search is constrained to operators that sample node heights and resolutions of polytomies present in the substitution tree.
For each subtree, 50 MCMC chains of 40 million iterations were run, sampling trees every 2 million states with the first 20 million states removed as burnin, resulting in datasets of 514-520 empirical trees. The analyses were run using a flexible Skygrid coalescent prior 83 with grid points every two weeks 80 . Model convergence and proper statistical mixing were verified in Tracer v1.7 84 .
The empirical trees sets estimated above were used to reconstruct importations into England under an asymmetric discrete phylogeographic model. Taxa were split into three locations: England, India and Global, with the Global state representing all countries other than England and India. We used the recently developed travel aware phylogenetic model available in BEASTv1.10 32 to better inform the transition rates in the reconstructed phylogeography. "Travel history" nodes were placed 1 week before isolates from England with known travel history. Where such travel included both India and other countries, ambiguous non-UK states were used. We ran eight chains of 625,000 states, sampling every 2,250 states and with the first 62,500 states removed as burnin, resulting in a total of 1,998 or 1,999 trees sampled from the posterior distribution.
Introductions were defined as nodes inferred to be in England with parents in either India or the catch-all Global location. The date of importation was assumed to be half-way between such a node and its parent. Five trees in the posterior set were excluded as they placed the root node of subtree 3 in England; this event was deemed highly unlikely as this node lies at least three months prior to the first sample from England during a time at which sequence coverage was above 50% in England. In Following the importation analysis, the seven largest importations (those with >1500 sequences, n = 25,983) were selected, as well as all importations with five or more sequences, from a representative tree from the posterior set with the same number of total importations as the posterior median. Within this analysis, only sequences with unambiguous postcode districts were used, resulting in a dataset of 25,139 sequences for the seven largest transmission lineages and 24,411 across 280 smaller lineages, which were extracted from the master COG-UK alignment, described in "Genomic Data" above. Within those postcode districts, we assigned random coordinates to each sequence, as the continuous phylogeographic analysis does not permit identical values. This was achieved using geographical data from 87 . We then reconstructed the geographic movement of nodes on a fixed tree (pruned from the overall MCC tree) in BEAST v.1.10 81 , using a relaxed random walk (RRW) model 88 , and a Cauchy distribution to account for among-branch heterogeneity in dispersal velocity. Large lineages were inferred independently, and all small lineages were inferred in a single run, with the shared parameters for likelihood, precision, and covariance of coordinates, but independent estimates of diffusion rate and trait likelihood. Following this run, 22 small introductions were removed due to their chains not converging to the same posterior. An MCC tree was then generated using TreeAnnotator 81 to summarise the posterior tree distribution for all lineages. Visualisations were made using a custom Python script. XML files were generated using beastgen.py (https://github.com/ViralVerity/beastgenpy), and can be found along with data processing and visualisation scripts on GitHub.
For the export analyses we compare Greater London to Greater Manchester which consists of the UTLAs Salford, Trafford, Stockport, Oldham, Bolton, Tameside, Bury, Rochdale, Wigan and Manchester.
State level incidence data from India: State level COVID-19 case count data were extracted from https://api.covid19india.org/csv/latest/states.csv.
Incidence data from England: COVID-19 case count data for each Local Tier Local Authority were downloaded via https://coronavirus.data.gov.uk/details/download.

Travel history data
Four sources of data were compiled to provide the travel history for laboratory confirmed cases, depending on availability for each individual case: (1) public health passenger locator forms are required for entry into the UK; (2) routine public health contact tracing data including UK Health Security Agency Second Generation Surveillance System (SGSS) 89 , (3) COVID-19 test requests with reported travel associations and (4) responses to additional telephone interviews for cases.

Covariate processing for statistical analyses
Country-level COVID-19 case count and vaccination data from 1st January 2020 to 9th July 2021 were downloaded via Our World in Data https://github.com/owid/covid-19-data/blob/master/public/data/owidcovid-data.csv. The number of individuals who have received a partial course of the vaccine per day by country was obtained from the difference in partially vaccinated individuals from consecutive days. The same operation was used to obtain the number of new fully vaccinated individuals per day by country . To deal with missing values, we assumed the vaccination rate to be constant between any two closest dates with vaccination data. This assumption was only applied when the time period between successive vaccination data entries was less than 7 days. Missing vaccination data for more than 6 consecutive days resulted in all of the new vaccinations administered from the last entry date to the next entry date to have been administered on the next entry date.

COVID-19 case count and vaccination data for the United Kingdom
: COVID-19 case count data and vaccination data were downloaded by UTLA from 30th January 2020 to 28th July 2021 by specimen and dosage date respectively via https://coronavirus.data.gov.uk/details/download. These data include positive lab-based polymerase chain reaction (PCR) tests and positive LFT tests, but do not include tests where the LFT was positive and PCR follow up tests were negative (see more details here 90 ). COVID-19 case count at the United Kingdom country level was calculated by aggregating case data on the UTLA-level. Additionally, to match the genomic data, the COVID-19 case count and vaccination data for some UTLAs were aggregated under an area code made up of these multiple UTLAs (see Table S3). All entries with the recently discontinued area code 'E10000002' were assigned the new area code 'E06000060'.
State level COVID-19 case count data from the U.S.: For U.S. states, COVID-19 case count data from 22nd January 2020 to 12th July 2021 were downloaded via https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36. Vaccination data from 14th December 2020 to 12th July 2021 were downloaded via https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7f. The number of new partially vaccinated individuals per day by state was calculated from the difference in total partially vaccinated individuals from consecutive days. The same operation was used to obtain the number of new fully vaccinated individuals per day by state .
U.S. states population level immunity: Daily population immunity estimates for COVID-19 was downloaded by the U.S. state from 26th January 2021 to 9th June 2021 via https://popimmunity.biosci.gatech.edu/. A sampling bias of four was selected for (i.e. a sampling fraction of 25% is assumed) using fully vaccinated individuals for the calculation of the estimate 91 . For our analysis at the weekly level, the mean of the week's daily estimated population immunity was calculated for each state.
State level population data from U.S.: The most recent population size estimate for each US state for the year 2019 was downloaded via https://www.census.gov/data/datasets/time-series/demo/popest/2010sstate-total.html.
Aggregated and anonymised human mobility data: We used the Google COVID-19 Aggregated Mobility Research Dataset described in detail in 47,92 , which contains anonymized relative mobility flows aggregated over users who have turned on the Location History setting, which is turned off by default. This is similar to the data used to show how busy certain types of places are in Google Maps -helping identify when a local business tends to be the most crowded. The mobility flux is aggregated per week, between pairs of approximately 5km 2 cells worldwide, and for the purpose of this study further aggregated for LTLAs in the United Kingdom (https://geoportal.statistics.gov.uk/datasets/lower-tierlocal-authority-to-upper-tier-local-authority-december-2016-lookup-in-england-and-wales/explore), U.S. states (https://gadm.org/), and country level (https://gadm.org/) for all other countries for the time period of October 29th, 2020 to June 6th, 2021. To produce this dataset, machine learning is applied to log data to automatically segment it into semantic trips. To provide strong privacy guarantees 93 , all trips were anonymized and aggregated using a differentially private mechanism to aggregate flows over time (see https://policies.google.com/technologies/anonymization). This research is done on the resulting heavily aggregated and differentially private data. No individual user data was ever manually inspected, only heavily aggregated flows of large populations were handled. All anonymized trips are processed in aggregate to extract their origin and destination location and time. For example, if users travelled from location a to location b within time interval t, the corresponding cell (a,b,t) in the tensor would be n∓err, where err is Laplacian noise. The automated Laplace mechanism adds random noise drawn from a zero mean Laplacian distribution and yields ( , δ)-differential privacy guarantee of = 0.66 and δ = 2.1 × 10−29 per metric. Specifically, for each week W and each location pair (A,B), we compute the number of unique users who took a trip from location A to location B during week W. To each of these metrics, we add Laplace noise from a zero-mean distribution of scale 1/0.66. We then remove all metrics for which the noisy number of users is lower than 100, following the process described in 93 , and publish the rest. This yields that each metric we publish satisfies (ε,δ)-differential privacy with values defined above. The parameter controls the noise intensity in terms of its variance, while δ represents the deviation from pure -privacy. The closer they are to zero, the stronger the privacy guarantees.
These results should be interpreted in light of several important limitations. First, the Google mobility data is limited to smartphone users who have opted in to Google's Location History feature, which is off by default. These data may not be representative of the population as whole, and furthermore their representativeness may vary by location. Importantly, these limited data are only viewed through the lens of differential privacy algorithms, specifically designed to protect user anonymity and obscure fine detail. Moreover, comparisons across rather than within locations are only descriptive since these regions can differ in substantial ways.
Flight data: We used data from the International Air Transport Association (IATA) 94 on the monthly number of confirmed passengers on flights (direct and indirect) from India to all other countries from January 2021 to June 2021.

Estimated Importation Intensity (EII):
We estimated the weekly importation intensity of the Delta variant for each destination location at the weekly level using the human mobility, GISAID and COG-UK genomic data and COVID-19 case data. An importation intensity value was calculated for each international movement by multiplying the proportion of Delta in the location of origin, the total number of new weekly reported COVID-19 cases and the movement intensity between each origin location and the destination location. We then aggregated all importation intensity values by week and destination location to obtain the EII.

Estimated Exportation Intensity (EEI):
We estimated the exportation intensity of the Delta variant for each location of origin at the weekly level using aggregated human mobility, genomic and case count data. An exportation intensity value was calculated for each international movement by multiplying the proportion of Delta in the country of origin, the total number of new weekly reported cases and the movement intensity between the country of origin and the destination country. We then aggregated all importation intensity values by week and origin location to obtain the EEI.

Estimated local human mobility intensity:
To obtain an estimate of the intensity of human mobility within a location, we calculated a 'relative self-mobility' value indicating the intensity of mobility within a location (where the origin and destination of the trips are the same) as a percent of the highest recorded of movement within this location in our mobility data during the time period from 2020-03-22 to 2021-06-06 using the human mobility data described above.
New Delta lineage introductions: Daily new lineage introductions into the United Kingdom by UTLA were obtained from the continuous phylogenetic analysis described above. The data were aggregated by week and UTLA.

Statistical modelling of Delta growth
Data pre-processing: we kept data starting from the 13th (week commencing 28th March 2021) and 11th (week commencing 14th March 2021) epidemiological weeks for England and the USA, respectively. These dates are referred to as baseline elsewhere in the main text. We excluded weeks after the first time 95% of samples were observed to be Delta in each UTLA (England) or state (USA) because after this point we can no longer estimate the relative growth rates reliably since Delta saturation has been reached. Therefore, each UTLA or state potentially had a different number of time points (Extended Data Table 8). Finally, we kept only those UTLAs or states which have data on Delta for at least three weeks (which are not required to be consecutive). In the final dataset for England and the USA, we had 590 (66 UTLAs and 8 weeks on average) and 735 (51 states and 14 weeks on average) and observations, respectively (Extended Data Fig. 8).

Model:
In what follows, we model the dynamics of Delta penetration within a UTLA or state: we refer to these levels of spatial unit as subregions. Here, we model how the number of Delta samples per subregion, i, varies over time, t (here, measured in weeks). The background transmission conditions driving the observed number of delta samples in a given subregion may be similar to the subregions within the same region. As such, we model this variation hierarchically and index variables at the subregional level by [ ] to indicate that subregion is nested within region : in England, regions correspond to NUTS1 units and, in the USA, to units also named regions. We use a binomial sampling distribution to model the number of A key quantity of interest is the relative growth in the proportion of Delta on the logit (i.e. log-odds) scale, which we estimate weekly and is denoted by Relative growth for each subregion, , is modelled spatially as depending hierarchically on its containing region, . It is also assumed to depend on subregion-specific covariates: where ρ is a region-level growth trend, [ ] is a vector of covariates, and [ ] is a subregion-and weekspecific term representing the deviation from the region-level growth. To account for temporal autocorrelation in the relative growth rate, a given region's relative growth is assumed to follow a random walk centred around its relative growth in the previous week: We chose to use different sets of covariates in our chosen "best" models for England and the USA. These covariates were chosen as important predictors if including them in the model improved the model fit (as indicated by higher log likelihood; Extended Data Table 4), gave better out of sample prediction (Extended Data Table 6), and if they were confounding variables. For England, the covariates included relative self-mobility and time since baseline (in weeks, standardized by subtracting the mean and dividing by the standard deviation); for the USA, it included baseline immunity (natural infection or vaccination induced immunity), relative self-mobility, and time since baseline. Including data on importations decreased the number of observations due to missing data on importations from 735 to 387 in the USA (using Estimated Importation Intensity, which was standardized before including in the regression) and from 590 to 299 in England (using New Delta lineage introductions, which was square root transformed because of skewed positive data). The effect size (95% credible interval) of importations was negligible when added to the "best" model: 0.00(-0.14, 0.14) for USA and -0.04(-0.07, -0.01) for England, and hence importation was not included in the final models.
We estimated our model in a Bayesian framework and chose priors (Extended Data Table 9) so that a wide range of possible Delta proportions were possible yet were centred on low values in the absence of further information: our prior predictive distributions in Figure S11b and Figure S14b illustrate these characteristics.
The computations were done using R and Stan using four parallel chains with 20,000 to 40,000 iterations (depending on the model), half of which were discarded as warmup iterations; the chains were subsequently thinned by a factor of 10 and the estimated "effect" of a variable on Delta growth (β). To determine the implications of the effect sizes, we computed the estimated proportion of Delta samples when the covariates took factual versus counterfactual values. We considered counterfactual scenarios for relative self-mobility in England and baseline immunity in the USA, holding all other covariates at their factual values. The counterfactual scenarios we considered were: o England: "Minimum mobility" (relative self-mobility = 0), "Maximum mobility" (relative self-mobility = 1) o USA: No prior immunity (baseline immunity = 0), 90% people immune at baseline (baseline immunity = 0.90) Simulation and model robustness: To test model parameter identifiability, we performed inference on simulated data. We fixed the parameters and simulated from the model to create hypothetical data (with 5 regions, each with 6 sub-regions (i.e. UTLA or state) and 15 time points). We then used these data to estimate the known parameters. We were reasonably able to recover our parameters and the model converged with R <1.01, bulk and tail effective sample sizes >400 after 20,000 iterations, discarding 10,00 warm-up iterations and thinning by a factor of 10 (Table S7 and Figure S15).  Fig. S11). 0.008 *p-value calculated for the difference between the log pointwise predictive density of the models with and without covariates 95 . The null hypothesis is that there is no difference in the out of sample prediction between the two models.