Structure and function of SARS-CoV-2 polymerase

Coronaviruses use an RNA-dependent RNA polymerase (RdRp) to replicate and express their genome. The RdRp associates with additional non-structural proteins (nsps) to form a replication–transcription complex (RTC) that carries out RNA synthesis, capping and proofreading. However, the structure of the RdRp long remained elusive, thus limiting our understanding of coronavirus genome expression and replication. Recently, the cryo-electron microscopy structure of SARS-CoV-1 RdRp was reported. Driven by the ongoing COVID-19 pandemic, structural data on the SARS-CoV-2 polymerase and associated factors has since emerged at an unprecedented pace, with more than twenty structures released to date. This review provides an overview of the currently available coronavirus RdRp structures and outlines how they have, together with functional studies, led to a molecular understanding of the viral polymerase, its interactions with accessory factors and the mechanisms by which promising antivirals may inhibit coronavirus replication.

The genomes of coronaviruses are among the largest known for RNA viruses, ranging from $26-32 kbp [11]. In addition to the structural and accessory proteins required for virion assembly, they encode for 16 nonstructural proteins (nsp1-16) that drive genome expression and replication. The (+)-strand genome can serve directly as template for the expression of the non-structural proteins, which are translated as large polyproteins (ORF1a or ORF1ab) and subsequently cleaved by viral proteases [12]. In contrast, structural and accessory proteins are translated from nested subgenomic mRNAs, the biogenesis of which requires template switching of the polymerase during transcription [13]. The non-structural proteins associate to form a replication-transcription complex (RTC) that carries out RNA synthesis, capping and proofreading, a unique feature of nidoviruses [12]. At the heart of the RTC is nsp12, which harbors the RNA-dependent RNA polymerase (RdRp) active site [14]. In order to be enzymatically active, it requires the accessory factors nsp7 and nsp8, which together with nsp12 form the core RdRp complex [15]. Additional subunits are thought to bind to this complex to form the RTC, including the helicase nsp13, the proofreading exonuclease and methyltransferase nsp14, the methyltransferase nsp16, as well as nsp9 and nsp10 [12]. Like cellular mRNAs, coronaviral transcripts bear an N 7 -methyl guanosine cap at their 5 0 end which protects the RNA from degradation, facilitates translation and may mediate immune escape [12,16]. Cytosolic RNA viruses usually encode the enzymes necessary for formation of this cap, namely a triphosphatase (TPase), a guanylyltransferase (GTase) and at least one methyltransferase (MTase) [17]. While nsp13 has been suggested to act as the TPase [18] and nsp14 and nsp16 may methylate N 7 of the guanosine base and the 2 0 ÀOH of the ribose, respectively [19,20], no GTase activity has been identified in coronavirus proteins so far [12]. Despite a large body of functional studies, a molecular understanding of the coronavirus RTC and how it integrates different steps of viral gene expression has been lacking, in part due to the absence of structural data.
Since the first outbreak of SARS-CoV-1 in 2002, the structures of many coronavirus proteins have been determined by X-Ray crystallography or cryo-electron microscopy (cryo-EM). However, the structure of the coronavirus RNA polymerase long remained elusive. Only recently, in 2019, the first cryo-EM structure of the SARS-CoV-1 RdRp was reported [21 ]. Since then, the COVID-19 pandemic has fueled scientific interest in coronavirus biology, and this has led to rapid progress in the structural characterization of coronavirus replication and gene expression. Within months after the first case reports, the cryo-EM structure of the SARS-CoV-2 RdRp was reported, and this was quickly followed by further structures of polymerase complexes in different functional states (Table 1). This review provides an overview of the currently available coronavirus RdRp structures and the molecular picture of coronavirus replication that is beginning to emerge from both structural and functional data.

Structure of the coronavirus core RdRp complex
The first structural insights into the coronavirus polymerase were obtained only shortly before the current SARS-CoV-2 pandemic, when the cryo-EM structure of the SARS-CoV-1 core RdRp was reported [21 ]. This structure revealed the overall architecture of the polymerase subunit nsp12 and how it interacts with the accessory factors nsp7 and nsp8 (Figure 1a). Nsp12 contains a Cterminal polymerase domain homologous to other viral RdRps and a N-terminal nidovirus-specific domain, which is essential for virus replication. This domain has been proposed to harbor a nucleotidyl transferase activity, and has hence been termed nidovirus RdRpassociated nucleotidyl transferase (NiRAN) domain [22]. Although the NiRAN domain was only partially resolved in this original structure, it was shown to resemble the nucleotide binding domains of kinases, but its molecular function remained unclear. The polymerase domain of nsp12 adopts the typical 'right-hand' fold of single-subunit RNA polymerases, consisting of thumb, fingers and palm subdomains [23]. These subdomains contain conserved motifs that together form the polymerase active site. Two copies of nsp8 bind to nsp12 on opposing sides of the RNA-binding cleft. These two nsp8 copies have been referred to by different designations in the literature (for example nsp8a/b [24 ] or nsp8-1/2 [25 ]). To unify the nomenclature, I here propose to refer to them according to their binding location on nsp12: Nsp8 T (for 'thumb') binds on one side of the cleft through an interaction of its C-terminal head domain with nsp7, which in turn binds to an extension in the fingers domain of nsp12 next to the thumb. Nsp8 F (for 'fingers') binds on the other side of cleft, and its head domain adopts a different fold and interacts with the fingers and interface domain of nsp12 directly. The N-terminal domains of both nsp8, which are highly conserved [26 ] and harbor Structure and function of SARS-CoV-2 polymerase Hillen 83 Table 1 Available coronavirus RdRp cryo-EM structures (in order of publication)

RdRp complex
Resolution (Å ) PDB code Reference residues important for RdRp activity [15], were not resolved in this structure, indicating conformational flexibility.
In April 2020, only months after its discovery, the first structure of the RdRp complex of the novel SARS-CoV-2 virus was determined by cryo-EM (Figure 1b) [25 ]. Shortly after, two further studies were published that reported similar structures [27,28 ]. As expected from the high degree of conservation (96.3% sequence identity in nsp12), the SARS-CoV-2 core RdRp is highly similar to that of SARS-CoV-1 and also forms a 1:1:2 complex of nsp12:nsp7:nsp8. Thus, the basic architecture of the RdRp core is conserved across these coronaviruses. The SARS-CoV-2 RdRp structures also reveal previously unresolved N-terminal parts of the NiRAN domain, which form additional interactions with the polymerase domain. However, the N-terminal domains of nsp8 are unresolved also in these structures.
Taken together, cryo-EM studies of the RdRp of both SARS-CoV-1 and SARS-CoV-2 revealed the architecture of the polymerase complex in the absence of nucleic acids (also referred to as the 'apo RdRp') and provided first insights into the interaction between nsp12 and the accessory factors nsp7 and nsp8.

Structure of replicating coronavirus polymerase
The next important milestone came from three studies that reported the cryo-EM structures of SARS-CoV-2 RdRp in complex with different template and product RNAs, thus revealing its active conformation [24 ,28 ,29 ]. In all three cases, this was achieved by assembling recombinantly produced nsp12, nsp7 and nsp8 with synthetic template-product RNA duplexes, which led to stable complexes that enabled structural characterization. The structures showed that, like other viral RdRps [30,31], the enzyme does not undergo drastic rearrangements upon RNA binding and revealed how the template and product strands are positioned in the active site during RNA synthesis (Figure 2a). The active site is formed by highly conserved residues in nsp12 motifs A-E, which include the catalytic aspartates D760 and D761 that coordinate metal ions required for catalysis. Nsp12 accommodates approximately one turn of duplex RNA, and the first base of the single-stranded portion of the template strand is positioned in the +1 site to direct 84 Virus structure and expression

Current Opinion in Virology
Structure of the core SARS-CoV RdRp.
incorporation of the next substrate nucleotide. In the absence of an incoming nucleoside triphosphate, the polymerase adopts the so-called post-translocated state in which the substrate binding site is empty, as observed previously for enteroviral and polioviral RdRps [30,31].
The structures of actively replicating SARS-CoV-2 RdRp also revealed a surprising feature of the accessory factor nsp8. In one study, the RdRp-RNA complex was assembled using an RNA that forms a long upstream duplex region, extending 28 bp from the active site [24 ]. In the resulting cryo-EM structure, the previously unresolved N-terminal parts of nsp8 are visible and form long helical extensions, similar as in a previous nsp7-nsp8 crystal structure [26 ]. In the RdRp complex, these long helical extensions contact the RNA duplex that emerges during replication, and may thus act as 'sliding poles' on the RNA (Figure 2a). This suggests that nsp8 and nsp7 may enhance the processivity of the RdRp by stabilizing its interaction with the RNA. The nsp8 extensions are apparently mobile in the absence of a longer upstream RNA duplex, explaining why other studies that employed    [24 ]. Coloring and depiction as in Figure 1. The RNA is shown without surface, with the template strand in blue and the product strand in red. (b) Structure of SARS-CoV-2 RdRp in complex with nsp13 (PDB 7CXN) [50 ]. Depiction as in (a). Nsp13 is colored in salmon. The template strand bound to nsp13 is indicated. (c) Structure of SARS-CoV-2 RdRp in complex with nsp13 and nsp9 (PDB 7CYQ) [56 ]. Depiction as in (a). Nsp9 is shown in cyan. shorter RNAs either could not resolve them [28 ] or observed conformational flexibility [29 ].
The structures of SARS-CoV-2 RdRp bound to RNA thus confirmed its conserved mechanism of catalysis and provided first structural insights into how the accessory factors nsp7 and nsp8 stimulate its activity.

Cryo-EM studies unravel the mechanism of antiviral compounds
The structures of active RdRp also paved the way for studies aimed at dissecting the mechanism of antiviral compounds that target coronavirus replication. In particular, the mechanism of remdesivir, an FDA-approved drug for treating COVID-19 [32], has been studied in detail by cryo-EM. Remdesivir is a nucleoside analog that inhibits SARS-CoV-2 RdRp in vitro [33,34] and viral replication in cell culture [35]. Biochemical studies showed that it does not act as an immediate chain terminator, but instead leads to delayed polymerase stalling by an unknown mechanism [34,36,37 ]. Two of the studies that reported cryo-EM structures of SARS-CoV-2 RdRp-RNA complexes also investigated how remdesivir is incorporated [28 ,29 ]. In both, the triphosphate form of remdesivir (RTP) was added during sample preparation to allow for incorporation by the viral enzyme. In one case, remdesivir was incorporated at the 3 0 end of the product RNA but remained in the substrate binding site, thus representing a pre-translocated state [28 ]. In a second study, remdesivir was observed incorporated at the À1 position after the addition of one additional GTP, which remained in the pre-translocated +1 position [29 ]. These studies provided the structural basis for remdesivir incorporation into the product RNA and show that the polymerase can add further nucleotides to the nascent chain by translocating remdesivir. Previous data had shown that three more nucleotides can be added to the RNA chain after remdesivir incorporation before further RNA elongation is inhibited [34]. The mechanism of this delayed stalling was investigated by a study that determined cryo-EM structures of RdRp-RNA complexes assembled with RNA containing remdesivir at different positions [38 ]. While remdesivir could be accommodated by the RdRp at position À3, the assembly of the complex with an RNA containing remdesivir at À4 led to a pretranslocated state in which remdesivir was retained at position À3 and the 3 0 end of the RNA occupied the substrate binding site. The structures suggest that translocation of remdesivir from the À3 to the À4 position would lead to clashes between its cyano group and Ser861 on the nsp12 thumb domain, as had been predicted based on modelling [34]. This was confirmed shortly after by a second cryo-EM study, which also showed that SARS-CoV-2 RdRp stalls in the pre-translocated state after the incorporation of the fourth RTP [39 ]. Thus, the structures show that remdesivir stalls the coronavirus polymerase by causing a translocation barrier after the addition of three more nucleotides.
The mechanism of binding of another antiviral with clinical potential against COVID-19, favipiravir, was also studied by cryo-EM. Favipiravir acts as a purine analog and inhibits SARS-CoV-2 and other RNA viruses by causing mutagenesis [40][41][42]. Two cryo-EM structures of SARS-CoV-2 RdRp in the presence of favipiravir revealed the nucleoside bound to the substrate binding site, but not covalently linked to the 3 0 end of the RNA [43 ,44 ]. These studies showed how favipiravir binds to the SARS-CoV-2 RdRp to mimic GTP, and thus suggest a mechanism for lethal mutagenesis. In addition, these snapshots of the polymerase with an incoming NTP bound provided general insights into substrate recognition by the coronavirus RdRp.
Recently, the first structure of a small molecule that is not a nucleotide analog, suramin, bound to SARS-CoV-2 RdRp was also determined by cryo-EM [45 ]. This drug has been shown to be effective against both parasitic and viral infections [46] and in vitro data suggest that it may also inhibit SARS-CoV-2 replication [45 ]. The structure shows that two suramin molecules can bind to SARS-CoV-2 nsp12 and this likely inhibits the polymerase by interfering with binding of the template and product RNA strands. This mechanism of inhibition differs from that previously proposed for norovirus RdRp [47], which can also bind two suramin molecules but at different locations on the polymerase than in the case of SARS-CoV-2.
These studies illustrate that the combination of structural, biochemical and cell culture data can provide a detailed understanding of the mechanism of antiviral compounds and will likely play a key role in the ongoing quest for therapeutic coronavirus RdRp inhibitors.

Structures of SARS-CoV-2 RdRp in complex with the helicase nsp13
Structural studies have also begun to unravel how the core RdRp interacts with other non-structural proteins to form the RTC [12]. One of them, nsp13, is a dual-function protein with both helicase and triphosphatase activity [18,48]. Nsp13 is required for coronavirus replication and has been proposed to be involved in RNA synthesis and capping, but its precise role remains unclear. Two cryo-EM structures of SARS-CoV-2 RdRp in complex with RNA and nsp13 demonstrate how the helicase interacts with the core RdRp (Figure 2b) [49 ,50 ]. Two copies of nsp13 can bind to the core nsp12-nsp7-nsp8 RdRp complex on opposing sides of the RNA binding cleft. In both cases, the interaction is mediated by the nidovirus-specific zinc binding domain (ZBD) of nsp13, which interacts with the N-terminal extensions of nsp8 T or nsp8 F , respectively. As for nsp8, I here propose a unifying nomenclature for these two nsp13 copies based on which copy of nsp8 they interact with: Nsp13 T binds on top of nsp8 T and forms contacts to the nsp8 T head domain via its RecA1 domain and to the nsp12 thumb. Nsp13 F binds on top of nsp8 F on the other side of the cleft and appears to be less stably bound, as one study reported particles lacking nsp13 F [49 ] and another study observed conformational flexibility [50 ]. In addition to the interactions with the RdRp core, the two nsp13 subunits contact each other via their helicase domains, and biochemical experiments suggest that this interaction may be important for helicase activity [50 ].
The structures of the RdRp-nsp13 complexes also reveal how nsp13 interacts with the RNA. The single-stranded downstream template RNA binds to the nsp13 T helicase domain, suggesting that nsp13 T could translocate on the template strand ( Figure 2b). However, nsp13 has been shown to unwind RNA and DNA in the 5 0 to 3 0 direction [51,52], which would oppose the direction in which the RdRp translocates on the template strand during synthesis. This has led to the suggestion that nsp13 could facilitate backtracking of the coronavirus polymerase on the RNA [49 ,50 ]. Backtracking is a well-characterized feature of the evolutionarily unrelated bacterial and eukaryotic multi-subunit RNA polymerases, which facilitates rescue of stalled elongation complexes and removal of misincorporated nucleotides [53]. The hypothesis that coronavirus RdRps may backtrack is supported by a recent preprint, in which the authors report the structure of a backtracked SARS-CoV-2 RdRp-nsp13 complex assembled on a RNA scaffold with a five nucleotide mismatch at the 3 0 end of the product strand [54 ]. In this structure, the mismatched segment extrudes through the substrate entry channel of nsp12, demonstrating a conserved topology between backtracked viral single-subunit and cellular multi-subunit polymerases. Biochemical data in this study further suggest that nsp13 stimulates RdRp backtracking. Nsp13mediated backtracking of the coronavirus RdRp may thus facilitate proofreading or template-switching during nested mRNA biogenesis by exposing the 3 0 end of the nascent RNA from the polymerase active site [49 ,50 ,54 ].
Together, the structures of nsp13-bound RTCs showed that nsp7 and nsp8 provide an interaction platform to which additional factors, such as nsp13, can bind. In addition, they suggest that nsp13 may enable backtracking of the RdRp complex, which may be required for cotranscriptional processes such as proofreading or template switching.

The role of the NiRAN domain
A unique feature of the nidovirus RdRp is the NiRAN domain of nsp12. The discovery that this domain can catalyze the transfer of nucleoside monophosphates (NMP) has led to speculations that it may be involved in RNA ligation, protein-mediated priming of RNA synthesis or transcript capping [22]. However, it is not clear whether the target substrate of nucleotidyl transfer is another protein or the RNA, and the precise role of this domain thus remains enigmatic.
The recent cryo-EM structures of SARS-CoV-2 RdRp provide structural evidence for the nucleotidyl-transferase activity of the NiRAN domain. First, the NiRAN domain resembles SelO, a conserved bacterial pseudokinase that transfers AMP to protein residues [21 ,49 ,55]. Second, structures of SARS-CoV-2 RdRp determined in the presence of ADP-BeF 3 [49 ] or GDP-BeF 3 [56 ] showed density for ADP or GTP, respectively, bound to the NiRAN domain, demonstrating that it can indeed bind nucleotides. A recent cryo-EM structure of SARS-CoV-2 RdRp in complex with nsp13 and nsp9 further reveals that the NiRAN domain interacts with nsp9 ( Figure 2c) [56 ]. Nsp9 is a putative RNA-binding protein that forms homodimers [57][58][59] and is essential for viral replication [60]. It has been suggested to interact with the RTC [61], but its molecular function remains unclear. The structure shows that nsp9 binds to the NiRAN domain as a monomer, and that its N-terminus extends into the putative active site of the NiRAN domain, where GDP is bound (Figure 2d). Biochemical data in this study indicate that the NiRAN domain may catalyze transfer of GMP to the 5 0 end of RNA in a nsp13dependent manner, which appears to be inhibited by nsp9. This led the authors to hypothesize that the NiRAN domain may act as the GTase required for transcript capping and that nsp9 could act as a regulatory or structural factor [56 ]. In contrast, a recent biochemical study suggests that the NiRAN domain catalyzes uridylation of the primary amine at the N-terminus of nsp9, lending support to a role for the NiRAN domain in proteinmediated RNA priming [62]. This activity is dependent on a 'NNE' motif at the nsp9 N-terminus, which is highly conserved among coronaviruses. Even though the authors of the nsp9-bound RdRp structure did not observe nucleotidylation of nsp9, it appears remarkably plausible based on the structure, as the first three residues of nsp9 reach into the NiRAN domain and the N-terminal aspartate is positioned within a few Å ngstroms of the bound nucleotide (Figure 2d) [56 ]. The observation that this activity is highly sensitive to the sequence at the nsp9 Nterminus [62] exemplifies the necessity for constructs with 'natural' termini for studying recombinant viral proteins in vitro.
Taken together, both functional and structural studies suggest that the nsp12 NiRAN domain acts as a nucleotidyl-transferase, but whether it is involved in transcript capping, protein-mediated priming or some other replication-associated process remains to be determined.