The diversity of SARS-CoV-2 has expanded considerably since the first detection of this virus in December 2019 in China. Several new SARS-CoV-2 variants and mutants have been classified as variants of concern regarding their transmissibility, pathogenicity, and potential to escape immune responses elicited by infection or vaccination [1, 2]. These include viruses of PANGO lineages B.1.351 (WHO Beta variant) and B.1.1.28/P.1 (Gamma variant) that have spread worldwide, as well as the less-prevalent lineages B.1.525 (Eta variant) and B.1.1.345, all of which harbor the substitution E484K in their spike protein [3,4,5]. This substitution, located in the receptor-binding domain (RBD) of the spike, has been reported to decrease sensitivity to neutralizing monoclonal antibodies and convalescent plasma in several studies [5,6,7,8]. Here, we describe three infections with rare E484K-harboring SARS-CoV-2 containing three other amino acid substitutions in the spike, including W152L, which is located in the N-terminal domain (NTD) and possibly reduces sensitivity to neutralizing antibodies [9], and D614G and G679V, located in the S2 subunit. Mutations causing 10 additional amino acid substitutions were found in these three genomes, including six located in the nucleocapsid, one in the membrane protein, one in the nsp12 gene product (RNA-dependent RNA polymerase), one in the nsp13 gene product (RNA helicase), and one in the nsp14 gene product (3’-5’ exonuclease) (Fig. 1A).

Fig. 1
figure 1

Map of the SARS-CoV-2 genome (A) and number of sequences classified in PANGO lineage R.1 in the GISAID database (B). The numbers of sequences classified in PANGO lineage R.1 were obtained from the GISAID database (https://www.gisaid.org/) [10] on June 17, 2021.

We classified SARS-CoV-2 from these three patients as Marseille-484K.V1 according to the nomenclature we have implemented locally to facilitate monitoring and analysis of the epidemic. They were also classified into GISAID [10] clade GR, Nextstrain clade 20B [11], and PANGO lineage R.1 [12]. The three Marseille-484K.V1 genome sequences were obtained at our institute from respiratory samples collected between February 11 and April 19, 2021. They accounted for 0.04% of the 8,417 SARS-CoV-2 sequences we had deposited in GISAID as of June 17, 2021. As of this date, 8,831 sequences of the PANGO lineage R.1 were available from the GISAID database (https://www.gisaid.org/) [10]. They were obtained from patients sampled between February 1, 2020, and May 28, 2021, but only four sequences were from samples collected before October 24, 2020 (Fig. 1B). The 8,831 sequences originated from 34 countries on five continents, but 99% originated from 13 countries, and 92% originated from Japan (5,890 sequences, 67%) or the USA (2,250 sequences, 26%). In Japan, all but four sequences were obtained from samples collected between November 30, 2020, and May 17, 2021 [13, 14]. This SARS-CoV-2 lineage was also associated with 46 cases in skilled-nursing facilities in March 2021 in Kentucky, USA [15]. In this outbreak, the efficacy of immunization with the BioNTech mRNA vaccine against SARS-CoV-2 infection was estimated to be 66% and 76% among the residents and vaccinated health-care personnel, respectively, and four reinfections (two laboratory-confirmed infections separated by >90 days) were reported, including one in a resident who eventually died. In France, only 11 other cases were detected, and together with our three genome sequences, they accounted for 0.03% of the 48,137 sequences deposited in the GISAID database as of June 17, 2021.

Phylogenetic analysis of the three Marseille-484K.V1 sequences obtained at our institute showed that they clustered (bootstrap value, 80%) with a genome sequence originating from Guinea-Conakry that was the best BLAST hit in the GISAID database [10] (Fig. 2). The three patients were women between 24 and 55 years old. One of them reported that she had returned from a trip to Guinea-Conakry. None of the three patients were hospitalized. Cell culture isolation was performed as described previously [16] by inoculating the respiratory samples from two of the three patients onto Vero E6 cells. The cycle threshold values of qPCR were 15 and 20 for the two inoculated samples. A cytopathic effect was observed after 5 days during the first passage, and no particular phenotypic features were noted, including when compared to viruses isolated during the period of March-April 2020 [16].

Fig. 2
figure 2

Phylogenetic analysis based on the full-length genome sequences of the three Marseille-484K.V1 viruses and the 30 sequences with the highest BLAST scores obtained from the GISAID database (https://www.gisaid.org/) [10]. These sequences are indicated by "BBH" (for best BLASTn hit) at the beginning of the sequence name. Additional sequences indicated by REF (for reference) at the beginning of the sequence name include the genome sequence of the Wuhan-Hu-1 isolate and genome sequences obtained at our institute and classified as predominant SARS-CoV-2 variants. Nucleotide sequence alignments were performed using MUSCLE software (http://www.ebi.ac.uk/Tools/msa/muscle/). Evolutionary history was inferred using MEGAX software (http://www.megasoftware.net/) using the neighbor-joining method and the Kimura 2-parameter model. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) is shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree; the scale bars indicate the number of nucleotide substitutions per site. Bootstrap values > 50% are indicated on the tree.

Structural analysis of the Marseille-484K.V1 spike protein showed that amino acid substitutions G679V and D614G are at the same height in the spike protein, in a region considered to play a role in the conformational change required for demasking the RBD in the open state of the trimeric spike (Fig. 3A). Interestingly, the surface potential of the RBD was markedly increased by substitution E484K (Fig. 3B), suggesting a kinetic advantage for this virus for accessing the electronegative surface of the ACE-2 receptor. The electrostatic surface potential of the NTD was also increased by the mutation W152L, but by an indirect mechanism of compaction of the domain that decreases the electronegative areas in favor of an enlarged electropositive surface. Although these changes in the surface potential of the RBD and the NTD may potentiate the attraction of the viral envelope to lipid rafts containing ACE-2 [17], this kinetic advantage appeared to be tempered by a concomitant loss of affinity of the RBD for ACE-2 (∆Gmut/∆Gwt = 0.93) and of the NTD for lipid raft gangliosides (∆Gmut/∆Gwt = 0.90). Together with surface potential measurements of RBD and NTD surfaces, these affinity estimations obtained by molecular modeling approaches were used to calculate the transmissibility index (T-index) of Marseille-484K.V1 according to our published protocol [18]. The T-index was 4.53, which is twice the T-index of the Wuhan-Hu-1 virus. In comparison, the T-index of the globally spread B.1.1.7 variant (a.k.a. Alpha variant) is 3.59, suggesting that Marseille-484K.V1 viruses could have been expected to disseminate more widely than observed. However, the loss of affinity of the W152L mutant for gangliosides is not anecdotical. Indeed, the large flat surface of the ganglioside-binding domain of the NTD [19] has a better geometric complementarity in the Wuhan B.1 strain than in the Marseille-484K.V1 viruses, as illustrated in Figure 3C. This may have contributed to slowing the dissemination of these Marseille-484K.V1 viruses. However, immune neutralization and other host factors may also explain why they did not outcompete other SARS-CoV-2 strains that circulated concomitantly.

Fig. 3
figure 3

Structural analysis of the Marseille-484K.V1 virus. (A) Localization of mutations by molecular modeling of the variant spike. (B) Effect of the E484K mutation on the electrostatic surface potential of the RBD. The values indicate the estimate of the surface potential of the RBD region facing the host cell membrane, as determined using Molegro Molecular Viewer software. Blue regions are electropositive, red regions are electronegative, and white regions are neutral. (C) Effect of the W152L mutation on the electrostatic surface potential of the NTD (upper panels). Molecular models of B.1 and mutant NTD binding to lipid rafts (GM1 gangliosides) are shown in the lower panels. Region 144-149 of the NTD is indicated in yellow (B.1 NTD) and green (Marseille_184).

The Marseille-484K.V1, or R.1, lineage is another example of convergent evolution leading to the amino acid substitution E484K, as also observed for the substitutions L452R [20], N501Y [21], Q677H [22], and L18F [23], which have occurred independently in various SARS-CoV-2 lineages and geographical areas. The low level of circulation of this variant in France and worldwide, with the notable exceptions of Japan and the USA, contrasts with the dramatic spread of other E484K-harboring viruses such as lineages B.1.351 and B.1.1.28/P.1. Viral culture did not reveal particularities, while structural analysis of the spike protein provided some hints to explain the seemingly low transmissibility of this variant. Although there is currently more information about the link between amino acid substitutions in the spike protein and viral replicative capacity, transmissibility, and immune escape [1], other mutations located in the Nsp13, Nsp14, M, and N genes of this Marseille-484K.V1 variant might also affect its replicative capacity and transmissibility. The Nsp13 product has both helicase and RNA 5′-triphosphatase activities and is essential for viral replication [24]. It has been reported to interact with the RNA-dependent RNA polymerase and acts in partnership with the replication-transcription complex. The Nsp14 product is a dual-functional enzymatic protein with 3’-to-5’ exonuclease activity that is responsible for the SARS-CoV-2 RNA proofreading activity, and with guanine-N7 methyltransferase activity allowing methylation of the viral mRNA cap structure [25]. The membrane protein is the most abundant structural protein in the SARS-CoV-2 virion and has been reported to be involved in several functions, including attachment to the host cell, protein assembly, and increased glucose transport [26]. Finally, the SARS-CoV-2 nucleocapsid protein is critical for ribonucleocapsid formation and has also been implicated in viral genome replication [27]. In addition, one should also consider as a possible explanation for the limited spread of the Marseille-484K.V1 viruses the other SARS-CoV-2 that co-circulated with them, such as the predominant Alpha/B.1.1.7 variant, with which they competed.

Taken together, previous findings show that, despite a tremendous number of epidemiological, genomic, proteomic, structural, and cell culture studies carried out on SARS-CoV-2, the emergence, spread, and outcome of new mutants and variants are not well understood and therefore difficult to predict. Notably, this hampers the implementation of the most appropriate preventive measures, including the development of suitable vaccines.