A complex IRES at the 5’-UTR of a viral mRNA assembles a functional 48S complex via an uAUG intermediate

RNA viruses are pervasive entities in the biosphere with significant impact in human health and economically important livestock. As strict cellular parasites, RNA viruses abuse host resources, redirecting them towards viral replication needs. Taking control of the cellular apparatus for protein production is a requirement for virus progression and diverse strategies of cellular mimicry and/or ribosome hijacking evolved to ensure this control. Especially in complex eukaryotes, translation is a sophisticated process, with multiple mechanisms acting on ribosomes and mRNAs. The initiation stage of translation is specially regulated, involving multiple steps and the engagement of numerous initiation factors some of them of high complexity. The use of structured RNA sequences, called Internal Ribosomal Entry Sites (IRES), in viral RNAs is a widespread strategy for the exploitation of eukaryotic initiation. Using a combination of electron cryo-microscopy (cryo-EM) and reconstituted translation initiation assays with native components, we characterized how a novel IRES at the 5’-UTR of a viral RNA assembles a functional translation initiation complex via an uAUG intermediate, redirecting the cellular machinery for protein production towards viral messengers. The IRES features a novel extended, multi-domain architecture, circling the 40S head, leveraging ribosomal sites not previously described to be exploited by any IRES. The structures and accompanying functional data, illustrate the importance of 5’-UTR regions in translation regulation and underline the relevance of the untapped diversity of viral IRESs. Given the large number of new viruses metagenomic studies have uncovered, the quantity and diversity of mechanisms for translation hijacking encrypted in viral sequences may be seriously underestimated. Exploring this diversity could reveal novel avenues in the fight against these molecular pathogens.


Introduction. 45
Metagenomic studies of environmental samples have uncovered a fascinating diversity of viruses 46 with a pervasive presence in the biosphere (1, 2). New viral clades have been discovered, showing 47 an expanded presence compared with previously assumed distributions (3). Diversity is specially 48 overwhelming in RNA viruses infecting animal hosts (4). Extensive gene shuffling, horizontal 49 gene transfer events and host switching combined with co-divergency suggest a rich and complex 50 evolutionary scenario within the animal virome, which remains largely unexplored (1, 5). 51 As strict cellular parasites, viruses rely on capturing cellular ribosomes to gain access to the host 52 machinery for protein production (6). In eukaryotes, especially in animals, this machinery is 53 complex and sophisticated, with large, multi-component protein factors assisting on the operation 54 of eukaryotic ribosomes (7). Although complex, translation in eukaryotes conserves four main 55 phases as its prokaryotic counterparts, namely: initiation, elongation, termination and recycling 56 (8). Initiation is significantly expanded in eukaryotes, with two GTP-regulated steps required for 57 dynamic, able to move in the 5' to 3' direction along the mRNA in search of an AUG initiation 68 codon in a favorable context, a process called "scanning" (12). Once the AUG codon is detected, 69 a structural transition in the 48S from an open, scanning-competent conformation to a closed, 70 scanning-arrested conformation occurs (14). This conformational change is accompanied by the 71 release of eIF1, eIF2 and GDP, leaving the Met-tRNAi Met at the P site of the 40S base paired with 72 the AUG codon (10). A second GTP-regulated step, catalyzed by initiation factor eIF5B, is then 73 required for the recruitment of the large (60S) ribosomal subunit (15,16). A full (80S) ribosome 74 primed with mRNA and Met-tRNAi Met at the P site then transitions towards the fast, less regulated 75 elongation phase (17, 18). 76 The above described pathway is referred to as the canonical, 5'-end and cap-dependent translation 77 route of initiation (12). The bulk of eukaryotic mRNAs transitions this route, however, deviations 78 from the canonical route are common, normally associated with translation under stress conditions 79 (19,20). Usually, non-canonical initiation is associated with extended 5' UnTranslated Regions 80 (5'-UTRs) on mRNAs (21,22). In complex eukaryotes, 5'-UTRs can be very long and can harbor 81 short Open Reading Frames (ORFs) designated as upstream ORFs or uORFs (22, 23). These 82 uORFs are enigmatic, as ribosome-profiling experiments clearly show ribosome positioning on 83 them, however, to date, the short peptides encoded in uORFs could have not been unambiguously 84 identified by mass-spectrometry (24). 85 Well studied examples of the functional relevance of uORFs at 5'-UTRs in stress regulated genes, 86 can be found in the yeast stress response regulator GCN4 or the mammalian transcription factor 87 ATF4 (25, 26). In these stress regulated genes, the presence of several uAUG codons has been 88 shown to be essential for differential translation regimes in homeostasis versus stress conditions 89 (22). Other examples of translation regulation by uAUG codons are less understood, for example, 90 The Dicistroviridae family of positive single-stranded RNA ((+)-ssRNA) viruses employs two 113 types of IRESs to differentially express the regulatory versus the structural genes (34). The genome 114 architecture of these viruses functionally segregates both kind of genes in two ORFs (Fig. 1A) 115 (35). The first ORF is preceded by an approximately 700 nucleotides long 5'-UTR which harbors 116 an IRES assigned to the type III family (36). In vitro characterization of the 5'-UTR-IRES of the 117 Cricket Paralysis Virus (CrPV), a prototypical Dicistrovirus, narrowed down the region of the 5'-118 UTR responsible for the IRES activity and established the strict requirement of eIF3 for this IRES 119 to initiate translation. Interestingly, the AUG codon of the CrPV ORF1 is immediately preceded 120 by a "start-stop uORF" (36). The combination of these two non-canonical initiation resources, 121 namely a structured IRES and a "start-stop uORF" at the 5'-UTR, within the same viral mRNA, 122 poses the question of how translation initiation is achieved by this RNA. 123 We sought to structurally characterize the 5'-UTR-IRES of the CrPV in its ribosome bound 124 configuration, to gain insights on the ribosome binding determinants of this peculiar IRES as well 125 as to understand how the delivery of Met-tRNAi Met is accomplished. Two high resolution cryo-EM 126 reconstructions of 40S/5'-UTR-IRES/eIF3 complexes combined with biochemical analysis, 127 allowed us to precisely characterize how this IRES, using an extended structure with a modular, 128 multi-domain architecture, binds and manipulates the 40S. 129 These results stress the importance of the unstudied diversity of viral IRESs and expand our 130 understanding on the role 5'-UTR regions play in eukaryotic translation. 131 132

Results. 133
The 5'-UTR-IRES of the CrPV requires eIF3 for a stable interaction with the 40S. 134 Previous studies of the IRES located at the 5'-UTR of the CrPV (5'-UTR-IRES from hereafter) 135 precisely defined the region of the 5'-UTR responsible for the IRES activity (residues 357 to 709) 136 as well as its dependency on initiation factor eIF3 for efficient translation initiation (36). Sequence 137 alignments of 5'-UTR regions of different viruses from this clade as well as with other viruses of 138 the same family failed to identify any similarity with described IRESs of the type III family, like 139 the Hepatitis C Virus IRES (HCV-IRES) or the Classical Swine Fever Virus IRES (CSFV-IRES) 140 (37, 38). In contrast with the well characterized type IV family of IRESs found in the InterGenic 141 Region (IGR-IRES) of these viruses, where strong sequence conservation allows comparisons and 142 identification of possible structural motifs, the 5'-UTRs of Dicistroviruses seem to harbor 143 divergent sequences, making structural modelling based on sequence conservation difficult (39). 144 In order to address this gap in knowledge, we produced a truncated version of the 5'-UTR region 145 of the genomic RNA of the CrPV containing the IRES (residues 357 to 728, Fig. 1A) to obtain 146 structural information of its 40S bound conformation by electron cryo-microscopy (cryo-EM). We 147 initially tested the in vitro dependency of 5'-UTR-IRES on eIF3 to engage purified 40S ribosomal 148 subunits in a stable interaction. We assayed the ability of the 5'-UTR-IRES to co-migrate with 149 purified 40S in sucrose density gradients as a test for the presence of a stable complex suitable for 150 structural studies (Fig. 1B). Unexpectedly, the 5'-UTR-IRES does not form a stable complex with 151 the 40S in the absence of eIF3, in contrast to the HCV-IRES, which is able to form stable 152 complexes with the 40S subunit alone and even with full (80S) ribosomes ( Fig. 1B) (37). In the 153 presence of eIF3, however, the 5'-UTR-IRES co-migrates with purified 40S subunits, 154 demonstrating the presence of a stable complex (Fig. 1B). This complex revealed clear particles 155 in cryo-EM images, rendering detailed two-dimensional class averages where density for eIF3 156 could be identified albeit at lower threshold (Fig. 1C). The 40S/5'-UTR-IRES/eIF3 complex 157 exhibited a delicate behavior under cryo-EM conditions, with a strong tendency to disassemble in 158 thin ice. Extensive screening for suitable ice areas was essential to obtain particles of the fully 159 assembled complex ( Fig. S1A and B). The sample also exhibited a high degree of heterogeneity, 160 that could be resolved by image processing in Relion (40, 41) (Fig. 1D,E and Fig. S1C). 161 Two main classes of particles containing density for 5'-UTR-IRES, 40S and eIF3 were found in 162 the dataset (Fig. 1D and E). Both classes contain density for the 40S, the IRES and the core 163 subunits of eIF3 (a/c/e/k/l/f/m) and class-2 additionally, presents density for eIF3 subunit d (Fig.  164 1E, eIF3d). Class-2 also exhibits a 40S head in a swiveled configuration. The eIF3d subunit 165 follows the 40S head swiveling movement to establish interactions with eIF3a, a core subunit of 166 eIF3 (see below). 167 Robust density ascribable to the 5'-UTR-IRES could be found in both classes ( Fig. 1D and E,  168 blue). The ribosome bound conformation of 5'-UTR-IRES shows an extended configuration, 169 circling the 40S head ( Fig. 2A). Three domains connected by flexible linkers, could be defined: 170 an elongated domain I (DI) at the back of the 40S head contacting ribosomal proteins uS3 and 171 RACK1 (Fig. 2), a second domain (DII) formed by a dual hairpin at the back of the 40S body 172 interacting with eIF3 (Fig. 3) and a third, large helical domain (DIII) placed at the periphery of the 173 40S E site, contacting ribosomal proteins uS7 and uS11 (Fig. 4). 174 175 Domain I of the 5'-UTR-IRES contacts ribosomal proteins RACK1 and uS3. 176 The 5' proximal segment of the 5'-UTR-IRES (residues 357 to 486) forms the domain I, 177 characterized by an elongated T-shaped structure anchored to the back of the 40S head ( Fig. 2A  178 and B). A long helical segment in this domain "wraps" around the apical part of ribosomal protein 179 RACK1. Two bases of this helical segment of domain I, C442 and C444, are extruded from the 180 body of the double helix to establish hydrophobic interactions with tyrosine residue 140 of RACK1 181 (Fig. 2C). These interactions bend the main helical segment of the 5'-UTR-IRES DI directing the 182 tip of this domains towards ribosomal protein uS3 (Fig. 2D) (Fig. 3D). In the 48S canonical configuration, eIF3 contacts the 40S through helix 1 of 203 eIF3a and helix 22 of eIF3c as well as eIF3d which is isolated in its 40S interaction, away from 204 the core subunits of eIF3 (Fig. 3D, left). The CSFV-IRES engages the 40S displacing eIF3 from 205 its position in the canonical 48S. (Fig. 3D,

Non-canonical base pairing in the 5'-UTR-IRES DIII places the uAUG codon near the P site. 215
Threading through the 40S channel formed by ribosomal proteins uS7 and uS11, a flexible single 216 stranded linker connects DII with DIII ( Fig. 4A). DIII forms a prominent, helical mass in the 217 surroundings of the E site of the small subunit at the inter-subunit face of the 40S. The helical 218 segment is very well defined in our maps as it is stabilized by numerous contacts with ribosomal 219 proteins uS7, uS11 and 18S ribosomal RNA (rRNA) bases ( Fig. 4 and Fig. S3). However, the 220 distal part of this domain forms two short stem loops that given their flexibility could only be 221 modelled at low resolution. 222 Inspection of the cryo-EM density in both classes reveled a distortion in the canonical double helix 223 of the main segment of this domain as it approaches the E site. The quality of the maps in this area 224 allowed de novo modelling of these residues, revealing a set of non-canonical interactions between 225 the RNA bases ( Fig. 4A and B). In-plane triple base interactions involving sugar and Hoogsteen 226 edges of the bases as well as purine-purine Hoogsteen base pairs could be found in this stretch of 227 residues of the helical segment of DIII ( Fig. 4B)(44). Overall, these non-canonical base pairs 228 induce a distortion at the base of DIII helping in the positioning of the single stranded segment of 229 the 5'-UTR-IRES harboring the uAUG codon at position 701, close to the P site (Fig. 4C, middle). Initial processing of the cryo-EM data revealed a marked dynamic of the head of the 40S. Masked 245 classification and refinement in Relion3 (47) revealed two mayor populations, distinguishable by 246 different degrees of swiveling of the 40S head ( Fig. 1 and Fig. S1C). The 40S head is attached to 247 the body by a single RNA helix, making this component of the ribosome extremely flexible (33). 248 Intrinsic and independent movements of the 40S head are instrumental in tRNA translocation and 249 also in canonical initiation (48, 49). The 5'-UTR-IRES seems to exploit this intrinsic dynamic to, 250 in a first instance, bind to the 40S and, in a second instance, "lock" the IRES in a specific 251 conformation committing the complex towards viral translation (Fig. 5). In the class-1 (open 252 conformation) the head of the 40S shows almost a canonical configuration with very little 253 swiveling and no tilt. In this conformation, the latch of the 40S (an early defined contact between 254 the head and the body of the 40S (50)) is closed. At the other side of the 40S head, access to the 255 channel formed by ribosomal proteins uS7 and uS11 is exposed and eIF3d density is not well 256 defined, probably due to a high degree of flexibility or low occupancy (Fig. 5A, left). In class-2 257 (closed conformation), the head of the 40S has experienced a medium range degree of swiveling, 258 compared to the widest displacement reported (48). The movement of the 40S head is followed by 259 the 5'-UTR-IRES, with DI experimenting the highest displacement compared to its position in 260 class-1 (Fig. 5A, right and B). In the swiveled conformation, the latch is open, and the channel 261 formed by ribosomal proteins uS7 and uS11 is plugged by eIF3d, which in this class, presents 262 robust density (Fig. 5C). These movements are restricted to the 40S-head and the domains of 5'-263 UTR-IRES, especially DI. No movement could be detected in the core subunits of eIF3, 264 maintaining in the closed class an identical configuration respect the 40S body as in the open class 265 (Fig. 5B). The swiveled configuration of the 40S brings eIF3d close to eIF3a, one of the core 266 subunits of eIF3 (Fig. 5C). Well defined density in this area could be observed for the interface 267 eIF3a/eIF3d (Fig. 5C, right). Thus, in the closed conformation, eIF3 shows a hitherto unknown 268 conformation, which we speculate could reflect a transient state populated at some point during 269 that overlaps with the position the TC populates at the E site in canonical initiation (Fig. 4C) (14,  276   43). Additionally, in our maps, we could only confidently identify density for the single stranded 277 segment of RNA of the IRES placed close to the P site until residue 695, whereas the canonical 278 AUG of ORF1 is found at nucleotide 709. These facts prompted us to wonder how the delivery of 279 Met-tRNAi Met to the AUG is accomplished. Making use of reconstituted initiation assays with 280 native components and toe-printing analysis (52), we could dissect the different steps followed by 281 the 5'-UTR-IRES in order to correctly place Met-tRNAi Met based paired with the AUG codon (Fig.  282   6A). TC by itself is able to load Met-tRNAi Met on the 40S/5'-UTR-IRES/eIF3 complex in isolation 283 (Fig. 6A, lane 2). Interestingly, this loading event is not directed towards the canonical AUG but 284 to the uAUG located at position 701 which is a part of the "star-stop uORF" that precedes the bona 285 fide AUG codon of ORF1 (Fig. 1A). A similar uAUG delivery of Met-tRNAi Met can be 286 accomplished by eIF5B which, under stress conditions, has been described to substitute eIF2 for 287 Met-tRNAi Met delivery (53-55), following then eukaryotic initiation a "bacterial-like" mode of 288 initiation (Fig. 6A, lane 4). Transitioning to the correct AUG could only be detected in the presence 289 of eIF1/eIF1A but only when the TC was present and not for eIF5B (Fig. 6A, lanes 3 and 5). 290 Notably, the presence of eIF1/eIF1A seems to be detrimental for uAUG Met-tRNAi Met loading by 291 eIF5B as their presence significantly reduces the toe-print signal that can be observed for eIF5B 292 in isolation. However, no concomitant increase in toe-print signal for the canonical AUG could be 293 components, we were able to visualize the three-dimensional structure of the 5'-UTR-IRES in its ribosome bound configuration and to describe the initiation route followed by this IRES to 318 assemble a functional initiation complex competent for elongation. 319 The 5'-UTR-IRES features a novel multi-domain, extended architecture that encircles three 320 quarters of the 40S head, exploiting binding sites not previously described for any IRESs (Fig. 2,  321   3 and 4). Ribosomal proteins uS3 and RACK1 are used by the IRES to anchor its DI to the back 322 of the 40S head (Fig. 2). The structure thus rationalizes previous data showing a preeminent role 323 of RACK1 in CrPV and related viruses infecting Drosophila (63). The interaction of DI with 324 RACK1 is also instrumental to position DII at the back of the 40S body, sandwiched in between 325 ribosomal protein uS17 and eIF3 (Fig. 3). Interestingly and in contrast with the HCV-IRES, the 326 conformation observed for eIF3 in the complex with 5'-UTR-IRES is very similar to the 327 conformation observed for eIF3 in the 48S complex, with the IRES "filling up" cavities present 328 between the 40S and eIF3 in this canonical complex (43). The HCV-IRES and related IRESs like 329 the CSFV-IRES displace eIF3 from its canonical location, using a very different mechanism for 330 IRES docking to the 40S (38). 331 In order to place the AUG of ORF1 in the surroundings of the P site, the 5'-UTR-IRES accesses 332 the P site through the E site, in a similar manner as the HCV-IRES (Fig. 4C) (45). In this aspect, 333 the 5'-UTR-IRES recapitulates binding strategies known for other IRESs like the IGR-CrPV-IRES 334 that also make use of ribosomal protein uS7 for its binding to the ribosome or the mentioned HCV-335 IRES which places its domains II and IV in the surroundings of the P site, sliding the elongated 336 DII from the back of the 40S to the P site through the E site (64). 337 The placement of the AUG of ORF1 in the surroundings of P site seems to be exerted by a 338 mechanism involving the intrinsic dynamics of the 40S head (33) (Fig. 5). The 5'-UTR-IRES 339 exploits the characteristic swiveling movement of the 40S head to bind and progress towards a 340 conformation that "locks" the IRES on the 40S and at the same time, induces a compact 341 conformation of eIF3 with subunit eIF3d in close contact with the core subunits of eIF3. We 342 speculate this hitherto unknown conformation of eIF3 may also be relevant for canonical initiation, 343 as the cap binding protein eIF4E and the rest of initiation factors of the eIF4 family interact with 344 eIF3 in the same region (51). 345 We propose the following comprehensive model for how the 5'-UTR-IRES of the CrPV operates: 346 immediately after the (+)-ssRNA genomic molecule of the CrPV is injected in the cytoplasm of 347 the host cell, the IRES harbored at the 5'-UTR captures 40S subunits by its DI (Fig. 6B, bottom  348 right). Recruitment of eIF3 is mediated by DII, allowing the sliding of the flexible linker 349 connecting DII and DIII between the head and the platform of the 40S to place DIII in the 350 surroundings of the E site (Fig. 6B, bottom left). A swivel movement of the 40S head closes the 351 channel between the head and the platform of the 40S effectively "locking" the 5'-UTR-IRES in 352 the 40S, inducing a compact conformation of eIF3 with eIF3d subunit in interacting distance with 353 eIF3's core subunit a (Fig. 6B, left top). With this configuration, eIF2 as part of the TC can deliver 354 Met-tRNAi Met to the uAUG located at nucleotide 701 and further assistance by initiation factors 355 eIF1/eIF1A allows for a downstream location of the AUG codon of ORF1 at nucleotide 709. Large 356 subunit recruitment grants transitioning towards elongation, committing the ribosome to the 357 production of viral proteins (Fig. 6B, right top).  Bottom, representative reference-free 2D class averages used for further image processing. D, E 399 After 3D classifications, two classes showing density for 40S (yellow), eIF3 (red) and 5'-UTR-400 IRES (blue) could be found in the data set. Class-1 (top, D) presents a non-swiveled configuration 401 of the 40S head and density for eIF3d is absent. Class-2 (bottom, E) shows a swiveled 402 configuration of the 40S head (arrows) with eIF3d (indicated) contacting eIF3's core subunits.

5'-UTR-IRES and HCV IRES production. 571
For cryo-EM analysis, a transcription vector for 5'-UTR-IRES (nucleotides 357-728) was 572 constructed inserting a T7 promoter sequence upstream of 5'-UTR-IRES sequence followed by an 573 BamHI restriction site, using pUC19 as a scaffold vector. For toe-print assay, 5'-UTR-IRES with 574 the extended ORF part for primer annealing was cloned employing a similar strategy. T7 RNA 575 polymerase in vitro transcription and purification on Spin-50 mini-column (USA Scientific) were 576 used to obtain highly purified 5'-UTR-IRES and HCV-IRES RNAs. 577

CryoEM sample preparation and data acquisition. 594
Aliquots of 3μl of assembled ribosome complexes at concentration range of 250-350 nM were 595 incubated for 30 seconds on glow-discharged holey gold grids (68)(UltrAuFoil R1.2/1.3). Grids 596 were blotted for 2.5s and flash cooled in liquid ethane using an FEI Vitrobot. Grids were 597 transferred to an FEI Titan Krios microscope equipped with an energy filter (slits aperture 20eV) 598 and a Gatan K2 detector operated at 300 kV. Data was recorded in counting mode at a 599 magnification of 130,000 corresponding to a calibrated pixel size of 1.08 Å. Defocus values ranged 600 from 1-3.6 μm. Images were recorded in automatic mode using the Leginon (69) and APPION 601 (70) software and frames were aligned using the Relion3 (47) implementation of the Motioncor2 602 algorithm (71). 603

Image processing and structure determination. 604
Contrast transfer function parameters were estimated using GCTF (72) and particle picking was 605 performed using GAUTOMACH without the use of templates and with a diameter value of 260 606 pixels. All 2D and 3D classifications and refinements were performed using RELION. An initial 607 2D classification with a 4 times binned dataset identified all ribosome particles. A consensus 608 reconstruction with all 40S particles was computed using the AutoRefine tool of RELION. Next, 609 3D classification without alignment (four classes, T parameter 4) identified a class with 610 unambiguous density for eIF3. This class was independently refined, and further masked 611 classification allowed the identification of two subclasses distinguishable by different degree of 612 40S head swiveling and presence or absence of eIF3d density. Final refinements with unbinned 613 data for the classes selected yielded high resolution maps with density features in agreement with 614 the reported resolution. Local resolution was computed with RESMAP (73). 615

Model building and refinement. 616
Models for the mammalian 40S and eIF3 docked into the maps using CHIMERA (74) and COOT 617 (75) was used to manually adjust these initial models. 5'-UTR-IRES was built manually using 618 COOT. An initial round of refinement was performed in Phenix using real-space refinement (76 ) 619 with secondary structure restraints and a final step of reciprocal-space refinement with REFMAC 620 (77). The fit of the model to the map density was quantified using FSCaverage and Cref and model-621 to-maps over-fitting tests were performed following standard protocols in the field (78, 79).