Identi�cation of SARS-CoV-2 P.1-related lineages in Brazil provides new insights about the mechanisms of emergence of Variants of Concern

One of the most remarkable features of the SARS-CoV-2 Variants of Concern (VOC) is the unusually large number of mutations they carry. However, the speci�c factors that drove the emergence of such variants since the second half of 2020 are not fully resolved. In this study, we described a new SARS-CoV-2 lineage provisionally designated as P.1-like-II that, as well as the previously described lineage P.1-like-I, shares several lineage-dening mutations with the VOC P.1 circulating in Brazil. Reconstructions of P.1 ancestor sequences demonstrate that the entire constellation of mutations that de�ne the VOC P.1 did not accumulate within a single long-term infected individual, but was acquired by sequential addition during interhost transmissions. Our evolutionary analyses further estimate that P.1-ancestors strains carrying half of the P.1-lineage-dening mutations, including those at the receptor-binding domain of the Spike protein, circulated cryptically in the Amazonas state since August 2020. This evolutionary pattern is consistent with the hypothesis that partial human population immunity acquired from natural SARS-CoV-2 infections during the �rst half of 2020 might have been the major driving force behind natural selection that allowed VOCs' emergence and worldwide spread. These �ndings also support a long lag-time between the emergence of variants with key mutations of concern and expansion of the VOC P.1 in Brazil.

The most accepted hypothesis to explain such a high number of lineage-de ning mutations is that VOCs result from selective pressures and adaptation of the virus during prolonged individual infections and subsequent transmission 3 .This hypothesis, however, was challenged by the early discovery of four P.1like genomes, most of them sampled in the capital city of Amazonas state, that branched as a sister monophyletic clade concerning lineage P.1 1,4 .The P.1-like clade also accumulated an unusually high number of genetic changes, including several P.1 lineage-de ning mutations in the S (L18F, P26S, D138Y, K417T, E484K, N501Y), NSP3 (K977Q), and N (P80R) proteins and unique mutations in the NSP2 (K456R), NSP3 (T1189I), NSP6 (V149A), NSP13 (S74L), S (ins214 and D1139H) and NS8 (K2stop) proteins.This early nding supports the hypothesis that P.1 lineage-de ning mutations did not accumulate in a unique long-term individual infection, but were acquired at sequential steps during the evolution of lineage B.1.1.28 in Amazonas.
In this study, we describe a second P.1-related virus variant that is spreading in several states from the different Brazilian regions and harbors 15 P.1 lineage-de ning mutations and six unique mutations.The description of this new P.1-related variant allowed us to trace with more precision the evolutionary steps that resulted in the emergence of the VOC P.1.Moreover, these results con rm our previous hypothesis that some of the P.1 lineage-de ning mutations were sequentially xed over several months during the second half of 2020.Our analyses also revealed that despite sharing crucial mutations in the RBD of the S protein, the P.1-like variants displayed a much less e cient epidemic spread in Brazil than the VOC P.1.
Maximum likelihood phylogenetic analyses SARS-CoV-2 P.1-related sequences here obtained were aligned with high quality (<5% of N) and complete (>29 kb) sequences that were available in the EpiCoV database in the GISAID (https://www.gisaid.org/) on March 31 st , 2021 and belongs to three different clades: 1) B.1.1.28sequences from Amazonas state, 2) P.1 sequences, and 3) previously described P.1-like sequences 1,4 .This dataset was then aligned using MAFFT v7.475 7 and subjected to maximum likelihood (ML) phylogenetic analysis using IQ-TREE v2.1.2 8der the GTR+F+R4 nucleotide substitution model, as selected by the ModelFinder application 9 .Branch support was assessed by the approximate likelihood-ratio test based on the Shimodaira-Hasegawa procedure (SH-aLRT) with 1000 replicates.The sequence of ancestral nodes was reconstructed using Time-tree 10 , and their mutational pro le was investigated using the Nextclade tool (https://clades.nextstrain.org).The temporal signal was assessed by the regression analysis of the root-to-tip genetic distance estimated from the ML phylogenetic tree against sampling dates using the program TempEst 11 .

Results
Mutation pro le analysis of SARS-CoV-2 positive samples detected at different Brazilian states between 12 th March 2020 and 31 st March 2021 revealed 44 sequences (Table S1) that harbor 15 out of 22 P.1 lineage-de ning mutations, including the three mutations of concern at the receptor-binding domain (RBD) of the S protein (K417T, E484K, and N501Y), deletion in the NSP6 (S106del, G107del, F108del) and the four-nucleotide insertion at ORF8/N intergenic region (ins28263) (Figure 1).These P.1-related sequences, here designated as P.1-like-II, lack some of the P.1 lineage-de ning mutations at ORF1ab (C2749T, C12778T, and C13860T), NSP13 (E341D), S (T20N) and NS8 (E92K), and further displayed six unique substitutions at ORF1ab (C8905T, C16954T, and A20931G), NSP4 (D217H), E/M intergenic region (A26492T), and N (P383L).The P.1-like-II sequences also share nine P.1 lineage-de ning mutations with the previously characterized P.1-like clade (now designated as P.1-like-I) (Figure 1).ML phylogenetic analysis revealed that P.1-like-II sequences branched in a highly supported (SH-aLRT = 96.6%)monophyletic clade together with seven sequences retrieved from the EpiCoV database (https://www.gisaid.org/) that displayed the same mutation pro le and were classi ed as P.1 in the EpiCoV database (Figure 2a).Clades P.1-like-I and P.1-like-II are not nested within the diversity of the VOC P.1, but branch as sister monophyletic clades that evolved from a common ancestor.Although clades P.1, P.1-like-I, and II do not share the same set of lineage-de ning mutations, they were designated as lineage P.1 according to the PANGO rules.This classi cation is based on the mutations of concern (K417T, E484K, and N501Y) acquired in the same evolutionary event (https://github.com/cov-lineages/pangodesignation/issues/77).We will then use lineage P.1 to designate the entire clade comprising the original P.1 and the new P.1-like sub-lineages.VOC (or clade) P.1 will be used to designate only the rst P.1 sublineage identi ed that dominated the Brazilian epidemic in 2021.
The P.1-like-II genomes were sampled at nine different Brazilian states, mainly from the South and Southeast regions (Figure 2b).The oldest one was detected in the Rio de Janeiro state on 19 th January 2021 17 , and the most recent one was identi ed in this study in the Amazonas state on 25 th March 2021.The Brazilian state that comprises most P.1-like-II sequences identi ed so far was Santa Catarina (59%), followed by Rio de Janeiro (10%), Rio Grande do Sul (8%), and São Paulo (8%).Thus, unlike the clade P.1 that was e ciently disseminated both within and outside the Amazonas state, the clade P.1-like-II was more e ciently disseminated outside the Amazonas state.It is also important to note that while VOC P.1 comprises a substantial fraction (66%) (http://www.genomahcov.ocruz.br) of SARS-CoV-2 sequences sampled at different Brazilian states during 2021, clades P.1-like-I and P.1-like-II comprises less than 1% of samples genotyped; supporting more successful dissemination of clade P.1 with respect to P.1-like clades in Brazil.
Analysis of the temporal structure revealed that clades P.1, P.1-like-I, and II accumulated a higher number of mutations when compared to B.1.1.28sequences and evolved at a similar rate over time (Figure 2c).Reconstruction of sequences at ancestral nodes provides a clear picture of the evolutionary steps that resulted in the different P.1 and P.1-related variants (Figure 3).Three mutations were xed in the basal B.1.1.28Amazonian clade (previously named 28-AM-II) (1) from which all P.1 clades evolved.Nine mutations were xed in the following evolutionary step that gave origin to the most recent common ancestor (MRCA) of lineage P.1 (designated as P.1 MRCA1 ).Six additional mutations were xed in the evolutionary step that gave origin to the MRCA of clades P.1 and P.1-like-II (designated as P.1 MRCA2 ), and 6-12 mutations were xed in the branches that originate the MRCA of each clade.Six out of the nine (67%) mutations in P.1 MRCA1 were in the S protein (including the three mutations of concern in the RBD), while only seven out of 32 (22%) mutations xed in the subsequent steps were located in the S gene.It is also interesting to note that the total number of lineage-de ning mutations accumulated by clades P.1 (n = 12), P.1-like-I (n = 14), and P.1-like-II (n = 12) since their divergence from P.1 MRCA1 was almost the same.
Bayesian phylogeographic analysis was next conducted combining all B.1.1.28sequences from Amazonas (including clade 28-AM-II), early VOC P.1 viruses sampled in December 2020, and all P.1-like sequences.This analysis supports that most ancestors during the diversi cation of lineage P.1 were probably located in the state of Amazonas (Posterior State Probability [PSP] = 1).The only exception was the P.1-like-II ancestor whose posterior probability was divided between Amazonas (PSP = 0.40) and Santa Catarina (PSP = 0.31) (Figure 4).The great uncertainty in the location of the P.1-like-II ancestor probably re ects the low number of sequences from this clade detected in the Amazonas state so far, making it di cult to trace their origin to that Northern state.This analysis estimated that Santa Catarina was the most critical hub of dissemination of lineage P.1-like-II to other Brazilian states.It is also noteworthy that P.1-like-II genomes from Rio de Janeiro formed an independent basal cluster, supporting local transmission of this lineage in this state.The different molecular clock models used consistently traced the median time of the P.1 MRCA1 to mid-August 2020, the median time of the P.1 MRCA2 to late Discussion Our genomic surveillance identi ed a new P.1-related genetic variant derived from the lineage B.1.1.28Amazonian diversity designated as clade P.1-like-II.It shares a common ancestor and several lineagede ning mutations, including the mutations of concern in the RBD of the S protein (K417T, E484K, N501Y), with the VOC P.1 and the clade P.1-like-I previously identi ed by our group 1 .The new clade P.1like-II displayed an overall low prevalence (<1%), but is geographically dispersed in Brazil, particularly in the South and Southeast country regions.
The most widely accepted hypothesis suggests that mutations in VOCs arose during long-standing SARS-CoV-2 single infections, like those observed in immunosuppressed subjects 3,18 .Our ndings, however, revealed that the nal constellation of mutations observed in the VOC P.1 was acquired through multiple interhost transmissions.During this evolutionary process that probably took several months, the stepwise acquisition of mutations was not uniformly distributed along the viral genome.Most VOC P.1 de ning mutations located in the amino(N)-terminal domain (NTD; L18F, P26S, D138Y) and in the RBD (K417T, E484K, N501Y) of the S protein were xed in the rst evolutionary step; while most mutations located outside the S gene were xed at subsequent steps.It is noteworthy that more intermediate evolutionary steps could exist between clade 28-AM-II and VOC P.1.However, the currently limited number of available genomes sampled in Amazonas between August and November (n = 87) limits the resolution of the evolutionary history reconstructed here.
The stepwise diversi cation of lineage P.1 in Brazil resembles the evolutionary pattern of the VOCs B.1.351and B.1.617that were rst detected in South Africa and India, respectively.Similar to the P.1 family clades described in Brazil, the VOCs B.1.351and B.1.617also comprise a family of related clades with partial overlapping mutations.The mutation pro le of lineage B.1.351suggests that ve nonsynonymous mutations in the S protein (D80A, D215G, E484K, N501Y, and A701V) were xed at the rst progenitor and further S mutations (L18F, 242-244del, R246I, and K417N) were xed at later steps in different descendent sub-lineages 19 .Lineage B.1.617was initially de ned as a double S mutant (L452R and E484), but subsequent phylogenetic analysis revealed a high within lineage diversity with at least four different sub-clusters (PANGO lineages B.1.617,B.1.617.1,B.1.617.2, and B.1.617.3) that could be linked to partially overlapping constellations of S mutations 20,21 .
Although this stepwise evolutionary pattern does not exclude the possibility that at least a subset of mutations could have originated in a long-term infected individual, sequential infections of such kind of patients are very unlikely.We propose that mutations of concern have been naturally selected during acute reinfections of partially protected immunocompetent individuals.According to this hypothesis, the partial immunity that human populations acquired through natural SARS-CoV-2 infections during early 2020 was a major selective force that drove the sequential emergence of mutations of concern in the second half of 2020.This model is consistent with a recent study that revealed a major change in selective pressures acting on SARS-CoV-2 variants circulating worldwide after October 2020, coinciding with the simultaneous expansion of different VOCs with convergent S mutations 22 .This model is also consistent with the ongoing evolution of the VOC P.1 in Brazil revealed by the recurrent acquisition of indels in the NTD of the S protein 4 .
The presence of key mutations of concern in the RBD of S protein (K417T, E484K, N501Y) of the VOC P.1 can explain the higher transmissibility and successful dissemination of this VOC with respect to previous circulating B.1.1.28lineage in Amazonas 1 .Our analysis, however, suggests that RBD mutations were not the only driver of the P.1 expansion.First, our evolutionary reconstruction suggests that the ancestors P.1 MRCA1 and P.1 MRCA2 , which harbor the three key mutations of concern in the RBD, circulated cryptically in the Amazonas state since August-September 2020, without fueling a large outbreak.Second, despite all P.1 sub-lineages share the same key mutations of concern, the estimated prevalence of VOC P.1 (69%) in 2021 was much higher than that of clades P.1-like-I and II (<1% each).These pieces of evidence suggest that viral mutations combined with human factors, such as lack of social distancing measures and mass gatherings events, may have contributed to the remarkable dissemination of the VOC P.1 in the Amazonas state and throughout Brazil afterward.
The time-lag between the emergence of variant progenitors carrying key mutations of concern and the start of epidemic waves observed in Amazonas was also observed in South Africa and India.The emergence of the B.1.351progenitor, which harbors key RBD mutations (K417N, E484K, N501Y), was traced in South Africa around late August 2020, while the country's second COVID-19 epidemic wave only began at the end of October 2020 19 .Similarly, the B.1.617progenitor with key RBD mutations (E484Q, L452R) probably dates back before October 2020 while the second COVID-19 epidemic wave in India only began in February 2021 23,24 .It is also observed that sub-lineages B.1.167.1 (that dominates in India), B.1.167.2 (that is spreading in India and in the United Kingdom), and B.1.167.3 (that remained uncommon in India and elsewhere) displayed quite divergent epidemic trajectories 21,25 , thus supporting a complex interplay between presence of mutations of concern and epidemic dynamics of SARS-CoV-2 lineages.
In summary, our ndings reveal that VOC P.1 is part of a more diverse family of P.1-related variants that evolved from a common ancestor, which carried key mutations of concern and circulated in Amazonas months before the abrupt resurgence of COVID-19 in the state in late 2020.The entire constellation of mutations that de ne the VOC P.1 was acquired in a stepwise process during multiple interhost transmissions.This stepwise interhost model, in opposition to the single long-term intrahost infection hypothesis, seems to be the most likely evolutionary mechanism to explain the emergence of VOCs in Brazil (P.1),South Africa (B.1.351),and India (B.1.617).The divergent epidemic trajectories of the different P.1 sub-lineages further suggest that mutations of concern combined with human-behavior factors were responsible for the successful spread of the VOC P.1 in Brazil.