Antiviral drug design based on structural insights into the N-terminal domain and C-terminal domain of the SARS-CoV-2 nucleocapsid protein

Graphical abstract


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive single-stranded RNA virus (+ssRNA) with a genome of $30 kb [1]. According to sequence alignment and bioinformatics analysis results, SARS-CoV-2 is identified as a member of the genus Betacoronavirus, which also contains SARS-CoV and MERS-CoV with sequence homology to SARS-CoV-2 as high as 79% and 50%, respectively [2,3]. The genome of SARS-CoV-2 mainly encodes four structural proteins, sixteen nonstructural proteins, and other accessory proteins. The four structural proteins are spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins [1,4]. Spike protein mediates the recognition and membrane fusion with host cells by angiotensin-converting enzyme 2 (ACE2), which is the main target of antibody and vaccine development [5][6][7]. Sixteen nonstructural proteins, including nsp1-16, mainly mediate viral synthesis and processing, and play important roles in virus proliferation [8,9].
Nucleocapsid protein is one of the main structural proteins of coronavirus. N protein binds viral RNA to form ribonucleoprotein (RNP) complex, which can be arranged in ''beads on a string" to protect the viral genome and package virion [10,11]. N protein also regulates viral replication, transcription, and interaction with membrane protein [12][13][14]. In addition, many immunological studies have shown that N protein interacts with host proteins to induce immune responses or regulate the cell cycle of host cells after viral infection [15][16][17][18]. Many antibodies to the N protein can be detected in patients with COVID-19 to be used as a basis for diagnosis and treatment [19,20]. The nucleocapsid protein of coronavirus contains two main domains: the N-terminal domain (NTD) and the Cterminal domain (CTD) [21]. They are sandwiched between three intrinsically disordered regions (IDRs). The middle IDR, which connects NTD and CTD, is called the central linker region (LKR) including a Ser/Arg-rich region [22]. Previous studies have reported that NTD is responsible for RNA binding, and CTD is responsible for dimerization, both of which contribute to the formation of RNP [21]. the importance of the N protein, its RNA binding and dimerization domains may be crucial targets for drug development.
The structure of the full-length N protein is still unknown due to its intrinsically disordered regions, but the structures of NTD and CTD of the N protein (designated as N-NTD and N-CTD) of viruses from genus Betacoronavirus, such as SARS-CoV-2, SARS-CoV, Middle East respiratory syndrome coronavirus (MERS-CoV), and human coronavirus (HCoV)-OC43, have been reported [23][24][25][26][27][28][29][30][31]. Several key residues involved in RNA binding and interaction have also been identified. However, the crystal structure of N-NTD bound to RNA and the molecular mechanism of RNP formation remain unclear.
Here, we report the crystal structure of NTD of the N protein of SARS-CoV-2 at a resolution of 1.90 Å, the crystal structure of NTD bound RNA at a resolution of 2.25 Å, and the crystal structure of CTD at a resolution of 3.0 Å. Through the Nuclear Magnetic Resonance (NMR) titration experiment, we identified ceftriaxone sodium, which is the third-generation cephalosporin and a commonly used antibiotic in the clinic, as an inhibitor of RNA binding to NTD. The new crystal structural information from this paper provides a basis for the study of the molecular mechanism of RNP formation in SARS-CoV-2, and could contribute to the development of other antiviral drugs targeting NTD of the N protein and its RNA binding sites.

Preparation of RNA
To examine the binding of RNA and N-NTD, dsRNA was prepared by two RNA oligonucleotides with sequence of 5 0 -CACUGAC-3 0 and 5 0 -GUCAGUG-3 0 . To anneal the RNA duplex, two RNA oligonucleotides were mixed at an equal molar ratio in annealing buffer (10 mmol/L Tris-HCl pH 8.0, 25 mmol/L NaCl, 2.5 mmol/L ethylenediamine tetraacetic acid (EDTA)), denatured by heating to 94°C for 5 min and then slowly cooled to room temperature.

Structure determination and refinement
X-ray diffraction data were collected at beamline using a wavelength of 0.979361 Å (NTD), 1.18057 Å (NTD-RNA), or 0.979150 Å (CTD). The structure was determined by molecular replacement (MR) with Phaser (STFC Daresbury Lab, Warrington, UK), using the X-ray crystal structure of SARS-CoV N-NTD (PDB ID: 2OFZ) or N-CTD (PDB ID: 2CJR) as the search model [26,28,32]. All initial models were further refined with Refmac5 (STFC Daresbury Lab, Warrington, UK) as part of CCP4 program (STFC Daresbury Lab) and interactively adjusted by COOT (STFC Daresbury Lab) [33]. The X-ray diffraction and structure refinement statistics are summarized in Table S1 (online).

Surface plasmon resonance (SPR) assay
The binding affinity of ceftriaxone sodium (MedChemExpress, Monmouth Junction, USA) and RNA to SARS-CoV-2 N-NTD were verified with an SPR assay by a Biacore S200 instrument (GE Healthcare). SARS-CoV-2 N-NTD was immobilized on a CM5 sensor chip (GE Healthcare). The channel of the blank group flowed through the chip with PBS buffer, and the channels of the experimental group were diluted to different concentrations of ceftriaxone sodium (MedChemExpress) or RNA with PBS buffer and flowed through the chip. Equilibrium dissociation constant (K D ) values were calculated and analyzed by Biacore S200 analysis software (GE Healthcare). In the competition experiment between ceftriaxone sodium (MedChemExpress) and RNA, 100 lmol/L ceftriaxone sodium (MedChemExpress) was added to the PBS buffer of blank groups and experimental groups.

NMR spectroscopy
The NMR experiments were carried out at 25°C on a Bruker Avance 600 MHz spectrometer (Bruker, Billerica, USA) equipped with a 5 mm triple-resonance TCI cryogenic probe. The U-[ 13 C, 15 N] NMR samples (Cambridge Isotope Laboratories, Inc.) were prepared in the SEC buffer, 5% D 2 O/90%-95% H 2 O. A series of double-and triple-resonance spectra were recorded to obtain sequence-specific resonance assignments, including 2D 1 H-15 N HSQC, 3D HNCACB, and 3D CBCA(CO)NH. For NMR titrations with candidate small molecule compounds (TargetMol Chemicals Inc., Boston, USA), 2D 1 H-15 N HSQC spectra were collected on the 15 Nlabelled N-NTD/N-CTD in the absence and in the presence of small molecules at different ratios. Small molecular compounds are usually dissolved into DMSO, which may induce shifts in the NMR peaks. To eliminate the effect of DMSO, we dissolved small molecular compounds into the SEC buffer of N-NTD/N-CTD and repeated NMR titration experiments. All NMR spectra were processed and analyzed with Sparky (University of California, San Francisco, USA) and NMRPipe (National Institute of Standards and Technology (NIST), Rockville, USA). The chemical shift perturbations (CSPs) were calculated using the equation: here, the significantly perturbed residues are defined as those with the CSP values > average + STD.

Molecular docking
The structure of the N-NTD in complex with ceftriaxone sodium (MedChemExpress) was calculated using HADDOCK (Utrecht University Bonvin Lab, Utrecht, the Netherlands) in combination with crystallography and NMR system (CNS) [34,35]. The actual docking was performed using the NMR structure of N-NTD (PDB ID: 6YI3) and the 3D structure of ceftriaxone sodium (PubChem CID: 5479530) (MedChemExpress) according to a standard protocol, which is described as follows. The topology and parameters files of ceftriaxone sodium (MedChemExpress) were generated with ACPYPE (Vrije Universiteit Brussel Wim F Vranken Lab, Brussels, Belgium) [36]. The N-NTD residues with CSP > 0.08 ppm and at least 20% solvent accessibility were chosen as active residues (T49, A50, W52, T54, G60, A90, T91, R92, L104, S105, R107, Y109, L139, A152, and Y172), while adjacent solvent-exposed residues were additionally selected as passive residues (N48, L55, H59, K61, R89, R93, D103, P106, A138, N140, P151, N153, F171, and A173). All atoms of ceftriaxone sodium (MedChemExpress) were defined as active for the experimentally driven docking protocol. In addition, three regions within the N-NTD were defined as fully-flexible segments for the advanced stages of the docking calculation (the N-terminal N48-A50, the central loop I94-M101, and the C-terminal G170-A173). The final set of 200 water-refined structures was clustered using a Fraction of Common Contacts approach with a default cut-off of 0.75. Finally, the structure of NTD-ceftriaxone sodium with the lowest energy score was selected for detailed analysis and display by PyMOL (The PyMOL Molecular Graphics System, version 2.1.1; Schrodinger, New York, USA).

Crystal structure of N-NTD
We designed a SARS-CoV-2 N-NTD construct containing 48-173 residues of the N protein, expressed and purified as the method described (Fig. S1 online). The crystal structure of SARS-CoV-2 N-NTD (PDB ID: 7XX1) was determined to a resolution of 1.90 Å by molecular replacement using the structure of SARS-CoV N-NTD (PDB ID: 2OFZ) as the search model [26]. The final structure was refined to R-factor and R-free values of 0.210 and 0.258, respectively. The additional data are given in Table S1 (online). Each asymmetric unit (ASU) contains four N-NTD molecules with overall root mean square deviation (RMSD) values of 0.43-1.06 Å between each monomer, each of which binds one zinc ion and one MES molecule (Fig. 1a, c). In the tetramer, MES molecules, which are buffer components, are found to form hydrogen bonds with two adjacent N-NTD monomers, making the entire tetramer assume a cyclic configuration (Fig. 1b). Compared with the structures of other N-NTD tetramers, our circular tetramer structure is more stable under the action of MES molecules, which may contribute to the oligomerization of the N protein. All four monomers in one asymmetric unit of N-NTD show a right-handed shape and contain a structural core and a flexible loop similar to other coronaviruses' N-NTDs [26,30,31]. The structural core composes of a four-stranded antiparallel b-sheet and a short 3 10 helix, while the flexible loop is a protruding b-hairpin (Fig. 1c). As depicted on the electrostatic potential surface, the junction between the protruding b-hairpin and the core structure forms a positively charged pocket, serving as a potential RNA binding site (Fig. 1d).
At present, there are several reports on the structures of SARS-CoV-2 N-NTD [23,24]. To further explore the structural information, we compared our structure with the reported structures (PDB ID: 6M3M, 7CDZ) (Fig. 1e) [23,24]. The RMSD values between our N-NTD structure and the other two N-NTD structures are 0.630-1.149 Å (PDB ID: 6M3M) and 0.635-1.190 Å (PDB ID: 7CDZ), respectively. In addition to the differences in the flexible N-and C-termini, there are still two differences in the b-hairpin region. First, compared with the other two N-NTD structures, our N-NTD structure shows an extended b-hairpin, similar to MERS-CoV N-NTD (PDB ID: 4UD1) [31]. Second, the b-hairpin shows a movement towards the RNA binding site as described later. These differences may be caused by the flexibility of b-hairpin and the influence of MES molecules. The MES molecule binds to R107 at the bottom of the b-hairpin region via hydrogen bond interactions (Fig. 1f).

Crystal structure of N-NTD and RNA complex
The electrostatic potential surface of N-NTD reveals a potential RNA binding site [26,30,37,38]. To obtain the structure of the N-NTD and RNA complex, we selected an RNA sequence (5 0 -CACUGAC-3 0 and 5 0 -GUCAGUG-3 0 ) as a duplex of RNA (dsRNA), which binds to N-NTD [38]. The K D value of N-NTD and dsRNA detected by SPR analysis was 8 nmol/L (Fig. S2a online). The crystal structure of the SARS-CoV-2 NTD-RNA complex (PDB ID: 7XWZ) was solved at a resolution of 2.25 Å. Each ASU contains two N-NTD molecules and two dsRNA fragments (Fig. 2a). Structural superimposition of N-NTD in the apo and RNA-bound forms reveals significant conformational changes within the b-hairpin region (Fig. 2b), highlighting the intrinsic flexibility of these RNAinteracting regions, which would allow each N-NTD monomer binds to dsRNA in different ways. Of our two N-NTD monomers, N-NTD binds predominantly to one of the strands in the dsRNA. In monomer 1, the hydrogen bond interaction between Y109, Y111, and RNA involves OP1 0 atoms and O3 0 atoms in the RNA phosphate backbone, the hydrogen bond between R88 and RNA includes only an OP1 0 atom, while the hydrogen bond between T49 and RNA includes only an O2 0 atom (Fig. 2c, d). However, hydrogen bond interactions between R92, R107, and RNA involve OP1 0 atoms and OP2 0 atoms in RNA phosphate backbones in monomer 2 (Fig. 2c, e). These binding modes are also different from that of the docking model of the NTD-RNA complex (PDB ID: 7ACS) by NMR (Fig. 2b) [38]. In the NMR-based model, three arginine resi-dues (R92, R107, and R149) directly bind to RNA. Although the amino acid residues affected by the three binding modes are distinct, the binding site is obviously at the same position, the positively charged junction between the protruding b-hairpin and the core structure. Distinct binding modes may indicate that N-NTD binds to RNA as much as possible in various ways to promote the formation of RNP.

Structure of N-NTD and ceftriaxone sodium complex
N-NTD and N-CTD contribute to the formation of RNP, so small molecule drugs that bind to them and block this process can be used as inhibitors targeting the N protein. We screened a large number of small molecule compounds from literature reports and compound libraries to identify potential drugs. Finally, we identified several candidate compounds. Because of the difficulties in growing crystals of protein and small molecular compounds, we chose to characterize the binding of these small molecule compounds to N-NTD by NMR titration experiments. Residues of N-NTD are specifically perturbed by ceftriaxone sodium dissolved in DMSO or SEC buffer on the 2D 1 H-15 N HSQC spectra (Fig. S3  online). Meanwhile, the SPR assay also detected the K D value of 8.3 lmol/L for N-NTD and ceftriaxone sodium (Fig. S2b online).
Ceftriaxone sodium is a third-generation cephalosporin and was approved by FDA in 1984. It is mainly used for infections caused by pathogens by inhibiting the synthesis of the cell wall. We added ceftriaxone sodium into the N-NTD protein sample at ratios from 0:1 to 25:1. Superimposition of the 2D 1 H-15 N HSQC spectra in the presence of ceftriaxone sodium at different ratios revealed remarkable resonance perturbations of many residues (Fig. 3a). Surprisingly, the same small compound can also induce resonance shifts in N-CTD, although relatively few residues were found to move (Fig. 3b). These results indicate that ceftriaxone sodium specifically interacted with both N-NTD and N-CTD. To locate the binding sites of ceftriaxone sodium on the surface of N-NTD, we analyzed chemical shift perturbations (CSPs) values on a per-residue basis. According to CSPs at the max ratio (N-NTD: ceftriaxone sodium = 1:25), a total of 17 residues (T49, A50, S51, W52, F53, T54, G60, A90, T91, R92, L104, S105, R107, Y109, L139, A152, and Y172) showed significant resonance shifts, which, by mapping onto the crystal structure of N-NTD, constituted two binding sites on the surface of N-NTD (Fig. 3a, c). Subsequently, we used the HADDOCK program for the NMR-restraint driven docking simulations of ceftriaxone sodium. The 3D structure with PubChem CID of 5479530 was chosen as a starting conformation of the ceftriaxone sodium. The analysis of CSPs provided a set of ''active" solvent-accessible residues on N-NTD that were expanded for surrounding ''passive" residues. The selection criteria for active residues were that their CSP values were higher than 0.08 ppm and at least 20% solvent accessibility. The standard docking protocol yielded a set of 200 water-refined structures, in which the structure with the lowest energy score was selected as a representative conformation for the N-NTD in complex with ceftriaxone sodium (NTD-ceftriaxone sodium) (Fig. 3d). As shown in the resulting structure model of NTD-ceftriaxone sodium complex, two main binding sites are located within the junction region, the same location as the RNA binding site (Fig. 3e). Interestingly, significantly perturbed residues include those (R92, R107, and Y109) involved in hydrogen bond interactions under two NTD-RNA binding modes (Fig. 2d, e, Fig. 3d). In addition, we also used SPR to study ceftriaxone sodium and RNA competition for NTD binding. The experimental results revealed that in the presence of 10-to 20-fold excess of ceftriaxone sodium, no RNA binding signal to N-NTD was detected, indicating that ceftriaxone sodium can compete with RNA for N-NTD binding (Fig. S2c online). Therefore, with the N protein as the drug target, our results suggest that ceftriaxone sodium is an inhibitor by blocking the interaction between NTD and RNA.

Crystal structure of N-CTD
We also designed a SARS-CoV-2 N-CTD construct containing 250-365 residues of the N protein, expressed and purified as the method described (Fig. S3 online). The crystal structure of SARS-CoV-2 N-CTD (PDB ID: 7XWX) was determined by molecular replacement at a resolution of 3.0 Å. The final structure was refined to R-factor and R-free values of 0.219 and 0.268, respectively. The additional data are given in Table S1 (online). Eight N-CTD molecules are paired to form four homo-dimers in one ASU, each dimer also binding one phosphate ion (Fig. 4a). Each monomer is composed of one 3 10 helix, three a-helices, and two b-strands (Fig. 4b, c). N-CTD molecules form stable dimers, consistent with the result of analytical gel filtration (Fig. S4c online). Extensive hydrogen bonds and hydrophobic interactions stabilize the dimerization of N-CTD. In the dimer, two b-strands from each monomer form an antiparallel b-sheet with eight hydrogen bonds interactions, including S318-Y333, S327-K338, T329-L339, L331-I337, Y333-G335, T334-T334, G335-Y333, I337-L331, and L339-T329 (residues written in front are from monomer 1) (Fig. 4d). There are also many hydrogen bonds in the interaction region of a-helices from two monomers, such as R277-G316, G278-R319, N285-I320, G316-R277, R319-G278, R319-E280, R319-Q283, and  I320-N285. In addition, residues F286, T296, W301, A305, and Q306 from monomer 1 show hydrophobic interactions with residues A311, S312, and G316 from monomer 2 in other interaction regions (data not shown). We overlaid our N-CTD structure with reported structures (PDB ID: 7CE0, 7DE1) [23,39]. Because of extensive hydrogen bonds and hydrophobic interactions, all structures show similar and stable dimer structures and characteristics with RMSD of 0.95-0.97 Å (Fig. 4e). Stable dimer structures are also essential elements in the formation of RNP.

Discussion and conclusion
As the pandemic of COVID-19 enters its third year, several vaccines and drugs have been approved. However, this public health crisis has not been completely controlled due to the rapid mutation of SARS-CoV-2, highlighting the urgent demand to find additional potential drug targets and new drugs. As one of the important structural proteins of SARS-CoV-2, N protein plays a crucial role in many physiological processes of the virus, especially when it combines with viral RNA to form an RNP complex [10,11]. In this study, we presented the crystal structures of N-NTD and N-CTD and compared them with previously published structures. The comparison results indicate that these structures are very similar to each other. NTD shows a right-handed shape, while CTD exists as a stable homo-dimer.
Based on the reported N-NTD structures of coronaviruses, it has been proposed that NTD binds to RNA through its positively charged pocket [26,30,37,38]. Our NTD-RNA complex structure, which was not reported before, reveals that NTD and dsRNA display a specific and different RNA binding mode. The involved residues include R88, Y109, and Y111 in our monomer 1, R92, and R107 in our monomer 2, which are different from R92, R107, and R149 in the docking model as reported before [38]. Residues inter- acting with RNA in other coronaviruses' N-NTDs have also been reported, such as R106, R107, and R117 in HCoV-OC43 (corresponding to R92, R93, and K102 in SARS-CoV-2) [30]. Although involved residues in different binding modes are not exactly the same, they include positively charged arginine, which may play a major role in RNA binding. Different binding modes indicate that N-NTD could bind to RNA in many different ways.
As one type of cephalosporin, ceftriaxone sodium is often used to treat bacterial infections, such as wound infections, respiratory tract infections, and intra-abdominal infections. Currently, there is an ongoing clinical trial to verify the therapeutic efficacy of ceftriaxone sodium in lung infections caused by COVID-19 [40]. Nevertheless, we first confirmed the inhibitory effect of this small molecule on the N protein of SARS-CoV-2. According to the process of RNP formation, there are two main antiviral strategies for the N protein: blocking the packaging of the viral RNA or inhibiting the oligomerization of the N protein. Drugs previously screened only inhibit the function of NTD, such as PJ34 [37,41].
Nevertheless, ceftriaxone sodium we screened shows inhibitory effects both on NTD and CTD. According to the docking model of the NTD-ceftriaxone sodium complex, ceftriaxone sodium mainly affects 17 residues, including those binding to dsRNA (R92, R107, and Y109). Meanwhile, the SPR assay also detected that the K D value of N-NTD and ceftriaxone sodium was 8.3 lmol/L, and ceftriaxone sodium could compete with RNA for N-NTD binding within a similar concentration range. Therefore, ceftriaxone sodium blocks the interaction between NTD and RNA by occupying the RNA binding site, inhibiting the formation of RNP and further interrupting the life cycle of SARS-CoV-2. Its mechanism of inhibiting CTD remains to be further explored. Recently OMICRON (B.1.1.529) has been reported as a variant of concern. However, mutations of the N protein (P13L, R203K, and G204R) only occurred in three IDR regions, not the conserved NTD and CTD domains [42,43]. Therefore, these sites that bind to RNA can serve as conserved drug targets, and ceftriaxone sodium may also serve as an effective inhibitor for OMICRON.
In conclusion, we reported and analyzed the crystal structures of NTD and CTD from the N protein of SARS-CoV-2 as well as the crystal structure of the NTD-RNA complex. Our structures revealed a novel tetramer configuration of NTD and identified a distinct binding mode of RNA binding by NTD. We also identified ceftriaxone sodium as an inhibitor that blocks RNA binding of NTD and inhibits the formation of RNP, which could interrupt the life cycle of SARS-CoV-2. This structural information presented in this paper could offer new insights into the follow-up studies of antiviral drug design targeting N protein.