CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes.

The SARS-CoV-2 protein Nsp2 has been implicated in a wide range of viral processes, but its exact functions, and the structural basis of those functions, remain unknown. Here, we report an atomic model for full-length Nsp2 obtained by combining cryo-electron microscopy with deep learning-based structure prediction from AlphaFold2. The resulting structure reveals a highly-conserved zinc ion-binding site, suggesting a role for Nsp2 in RNA binding. Mapping emerging mutations from variants of SARS-CoV-2 on the resulting structure shows potential host-Nsp2 interaction regions. Using structural analysis together with affinity tagged purification mass spectrometry experiments, we identify Nsp2 mutants that are unable to interact with the actin-nucleation-promoting WASH protein complex or with GIGYF2, an inhibitor of translation initiation and modulator of ribosome-associated quality control. Our work suggests a potential role of Nsp2 in linking viral transcription within the viral replication-transcription complexes (RTC) to the translation initiation of the viral message. Collectively, the structure reported here, combined with mutant interaction mapping, provides a foundation for functional studies of this evolutionary conserved coronavirus protein and may assist future drug design.

Having built an initial model, we then identified a number of putative zinc binding sites and 130 repeated the sample purification and cryo-EM imaging in the presence of zinc. This yielded an 131 improved 3.2 Å cryo-EM map, which revealed additional details and enabled improved modeling 132 of residues 5-505 of the SARS-CoV-2 Nsp2 (Fig 1B). To our surprise, under these Zn-included 133 conditions the density for the C-terminal 130 amino acids was completely missing. In the 134 cryo-EM map without zinc, although the density for the flexible C-terminal domain was present, 135 it was resolved at between 5-6 Å resolution. The closest homologous structure showed less 136 than 10% sequence identity (PDB:3LD1) 26 , and as the domain was predicted to be high in the 137 beta sheet fold this posed a significant challenge for de novo modeling based on low-resolution 138 cryo-EM density alone. 139 The recent utilization of deep learning for protein structure prediction based on amino 140 acid sequence has led to a new level of success, as demonstrated by CASP14 27 . Specifically, 141 the AlphaFold2 team was able to predict protein structures with unprecedented accuracy, 142 producing results sometimes indistinguishable from the experimentally derived structures. 143 AlphaFold2 and other teams in the CASP14 also ran predictions on SARS-CoV-2 proteins, 144 including Nsp2. Out of all the available predicted models for Nsp2, only one model has an 145 RMSD of less than 20 Å to our experimental model: C1901TS156_4 from the AlphaFold2 team. 146 The other 5 models from AlphaFold2 were also close to 20Å RMSD, so we aligned all the 147 available Nsp2 models from the AlphaFold2 team (5 from the CASP14 and one updated model 148 available on their website 28 ) to our structure (Sup Fig 3). This comparison made it clear that, 149 globally, the predictions were quite different from the experimentally derived structure. In 150 addition, the most updated model was missing a prediction for 93 amino acids of the protein (Fig   151 1C). 152 However, when analyzed in isolation, the individual motifs and domains of the proteins 153 are remarkably close to the experimentally derived structure. This observation prompted us to 154 break the model down into 4 subregions and align them to the experimentally derived structure 5 155 independently. This yielded high local similarity per domain (average RMSD values of less than 156 2 Å, Fig 1D). The prediction for the missing C-terminal 130 amino acids in isolation fit well within 157 the lower resolution density for that domain in the cryoEM map without zinc. We therefore 158 combined the AlphaFold2 domain prediction for the C-terminal 130 amino acids with our 159 experimentally built cryo-EM model to yield an experimentally valid and complete structure of 160 full-length Nsp2 (Fig 1E, Sup Table 1). 161 162 Nsp2 shows low global conservation among beta-coronaviruses, but possesses a highly 163 conserved Zn binding motif. 164 To better understand which regions of Nsp2 are functionally important, we performed a 165 sequence alignment of Nsp2 across beta-coronaviruses from different species. Nsp2 shows low 166 conservation, with the N-terminal half of the protein being marginally more conserved (Fig 2A   167 and Sup Fig 1). Overall, SARS-CoV-2 Nsp2 is 68% identical to SARS-CoV-1 Nsp2 and only 168 20% identical to MERS virus Nsp2. Strikingly, the most conserved residues are a cysteine quad 169 coordinating a Zn 2+ ion in a Zn ribbon like motif, with three of the four cysteines being invariant 170 across all the virus sequences. Performing a structural similarity search with this motif from 171 Nsp2 indicates that it is similar to zinc ribbons 29 in a number of RNA binding proteins in RNA 172 polymerases and ribosomes (Fig 2A insert, average RMSD for the region of 1.7 Å). In some of 173 these proteins these motifs explicitly have been implicated in RNA binding and in one structure 174 (PDB:1JJ2, chain 2), the zinc ribbon on the ribosomal protein L44E is directly interacting with 175 the ribosomal RNA. This motif is also similar to the tudor domains in the histone tail binding 176 protein JMJD2A (RMSD of 1.7 Å). Although the fold is similar, the tudor domain is missing a 177 Zn 2+ ion in the JMJD2A structure (PDB:2QQS). Previous studies have associated Nsp2 with the 178 viral RTCs 12,30 and during the purification from bacteria we observed strong, apparently 179 non-specific binding to E.coli nucleic acids that required chromatographic separation. One 180 possibility, therefore, is that this motif is important for Nsp2 interactions with nucleic acids.
In addition to the proteins containing zinc ribbons and tudor motifs, a search of the PDB 182 for structurally similar proteins returned only one additional structure, the structure of Nsp2 from 183 Avian Infectious Bronchitis virus (PDB:3LD1). Although the sequence identity is below 10% for 184 these proteins, the beta sheet C-terminal domain aligns well with our model. No other structures 185 came up in our structural similarity search with either the FATCAT 31 or DALI 32 servers.
186 187 Subsets of acquired mutations in Nsp2 group into surface patches. 188 Examining the mutations that occur in SARS-CoV-2 Nsp2 during the COVID19 189 pandemic, over 50 sites have been identified as being under positive selection (at the time of 190 writing, based on the dn/ds>1 metric 33 , 13 ). Most of these mutations occur at low frequency. Two 191 mutations however, T85I and I120F, are present at frequencies of roughly 13% and 5% 192 respectively. The T85I mutation maps to a surface residue on our structure (Fig 3). The side 193 chain of T85 is surface exposed, therefore replacing it with a hydrophobic isoleucine should not 194 be favorable. However, if this region of Nsp2 is involved in protein-protein interactions such a 195 substitution might be a gain-of-function change, stabilizing a hydrophobic binding interface. The 196 second residue that is mutated, I120F, is not surface exposed and instead packs in a 197 hydrophobic core that anchors a small helix. This small helix is attached to a highly charged 198 loop on the surface of the protein and its dynamics may be functionally important. A 199 phenylalanine mutation may further stabilize this helix anchor point by participating in 200 pi-stacking interactions with neighboring aromatic residues (Fig 3 inset).

201
The structure allowed us to map the spatial relationships of conserved residues in Nsp2 202 among SARS-CoV-2 strains, revealing unexpected regions of conservation and selection. To 203 identify rapidly evolving regions of the protein, we mapped all the positively-selected mutations 204 to the protein surface (Fig 4). This analysis revealed charged surfaces which are devoid of 205 mutations, potentially indicating surfaces important for conserved interactions (Fig 4). There are 206 also two residue clusters where mutations found in strain variants are proximal to one another 7 207 and alter the characteristics of the protein's surface in similar ways. Cluster 1 is near the 208 N-terminus consisting of three arginine residues (R27C, R52C, R4C) that mutate individually to 209 cysteines, reducing the exposed positive surface charge in that region and introducing a 210 sulfhydryl. Cluster 2 consists of six proximal residues which mutate individually to more 211 hydrophobic residues (G262V, G265V, G285V, A411V, T371I) (Fig 4) in the variant strains. In In this report, we were able to combine cryoEM with recent advances in de novo protein 241 predictions to obtain a complete atomic model for SARS-CoV-2 Nsp2 protein. Although there 242 was a recent report of using AlphaFold2 predicted protein structure of Orf8 to solve the phase 243 problem in crystallographic studies, to our knowledge this is the first explicit use of AlphaFold2 244 predictions with restraints from an experimental cryoEM density for model building 35 . This 245 exercise suggests that domain structure predictions from deep neural networks are increasingly 246 likely to be locally accurate and, when combined with experimental restraints, sufficient for 247 global structure prediction and integrative structural modelling. Electron cryo-microscopy and 248 cryo-tomography will be important sources of such overall shape information, and readily 249 obtainable, low-resolution measurements like negative stain electron microscopy, small-angle 250 X-ray scattering, cross-linking mass spectrometry, or even biochemical experiments may 251 provide sufficient constraints for accurate, global models to be determined in combination with 252 predicted domain structures. It is possible that further improvements in the prediction algorithms 253 will eliminate the need for experimental measurements entirely. However, atomic resolution 254 structures of multi-component and multi-domain assemblies are still relatively uncommon, and 255 this deficit of appropriate training data in the PDB may limit the accuracy of computational 256 models for multi-domain assemblies and higher-order complexes. Put another way, the 257 deficiency of data about protein-protein interfaces may mean that de-novo predictions of 258 complex assemblies will remain underdetermined for some time. Future work will explore the 9 259 use of restraints from 3D cryo-EM maps, 2D images, tomograms, and other data sources like 260 SAXS for the potential functions utilized by neural nets.

261
Our Nsp2 structure together with analysis of natural and designed sequence variation in 262 the Nsp2 of SARS-CoV-2 suggests a number of biological roles for Nsp2 and also regions of 263 interest on the protein. We identify a highly conserved zinc ribbon motif which structurally is 264 highly similar to zinc ribbons in RNA binding proteins. One possibility, therefore, is that this motif 265 is important for Nsp2 interactions with nucleic acids. Interestingly our mass spectrometry studies

304
Our mass spectrometry experiments of the most prevalent Nsp2 mutation, T85I, did not 305 identify any changes in host interactions of this mutant. This may be due to our experiments 306 lacking the context of other viral proteins that would be present in a bona fide infection or 307 potentially due to the wrong cellular context. Alternatively this may suggest that some mutations 308 do not confer any fitness benefit and are simply present due to the C-U hypermutation observed 309 in SARS-CoV-2, which is likely driven by host mediated APOBEC editing 42 . Interestingly, there is 11 310 a recent report demonstrating that the SARS-CoV-2 Nsp2 T85I mutant shows a minor 311 replication defect in Vero green monkey cells, but has no effect in human cells. This is 312 consistent with the T85I mutation not conferring a strong selective advantage 43 . Globally, the 313 second most prevalent SARS-CoV-2 amino acid substitution that is driven by the C-U 314 hypermutation is a T to I change. Therefore the T85I mutation in the 20C clade of SARS-CoV-2 315 may be neutral in fitness, but stable due to host-mediated RNA editing. 316 Overall, analysis of the resulting Nsp2 structure revealed a rapidly evolving protein 317 surface, with potential consequences for host-virus interactions. Leveraging the new structure 318 with natural Nsp2 variations and mass spectrometry we were able to identify surfaces important 319 for specific Nsp2 interactions. The pattern of disruption of interactions points to at least three 320 specific areas of biology that Nsp2 is involved in: interactions with endosomes through 321 cytoskeletal elements, interactions with modulators of translation, and also direct interactions 322 with ribosomal RNA. The exact roles Nsp2 plays in these pathways will require further 323 experimental characterization using the structure-based point mutants described here.