Mechanistic insights into mRNA 3’-end processing

: The polyadenosine (poly(A)) tail found on the 3’-end of almost all eukaryotic mRNAs is important for mRNA stability and regulation of translation. mRNA 3’-end processing occurs co-transcriptionally and involves more than 20 proteins to specifically recognize the polyadenylation site, cleave the pre-mRNA, add a poly(A) tail, and trigger transcription termination. The polyadenylation site (PAS) defines the end of the 3’-untranslated region (3’-UTR) and, therefore, selection of the cleavage site is a critical event in regulating gene expression. Integrated structural biology approaches including biochemical reconstitution of multi-subunit complexes, cross-linking mass spectrometry, and structural analyses by X-ray crystallography and single-particle electron cryo-microscopy (cryoEM) have enabled recent progress in understanding the molecular mechanisms of the mRNA 3’-end processing machinery. Here, we describe new molecular insights into pre-mRNA recognition, cleavage and polyadenylation. The polyadenosine (poly(A)) tail found on the 3 0 -end of almost all eukaryotic mRNAs is important for mRNA stability and regulation of translation. mRNA 3 0 -end processing occurs co-transcriptionally and involves more than 20 proteins to speciﬁcally recognize the polyadenylation site, cleave the pre-mRNA, add a poly(A) tail, and trigger transcription termination. The polyadenylation site (PAS) deﬁnes the end of the 3 0 -untranslated region (3 0 -UTR) and, therefore, selection of the cleavage site is a critical event in regulating gene expression. Integrated structural biology approaches including biochemical reconstitution of multi-subunit complexes, cross-linking mass spectrometry, and structural analyses by X- ray crystallography and single-particle electron cryo-microscopy (cryoEM) have enabled recent progress in understanding the molecular mechanisms of the mRNA 3 0 -end processing machinery. Here, we describe new molecular insights into pre-mRNA recognition, cleavage and polyadenylation.


Introduction
Most eukaryotic pre-mRNAs are capped at their 5 0 -end, spliced at intronic sites, and polyadenylated at their 3 0end before they are exported from the nucleus as mature mRNAs. Each of these modifications is carried out by a set of conserved and highly regulated multi-protein complexes. The 3 0 -end processing machinery co-transcriptionally monitors nascent transcripts for specific sequences ( Figure 1) and, upon recognition of the polyadenylation site (PAS), cleaves the pre-mRNA and adds a poly(A) tail to the newly generated 3 0 -end [1]. The 3 0 -end processing machinery also triggers transcription termination. To co-ordinate these functions, it contains three different enzymatic activities -endonuclease, poly (A) polymerase, and protein phosphatase. Since the cleavage event defines the 3 0 -end of the transcript, and consequently the 3 0 -untranslated region (3 0 -UTR) of the future mRNA, understanding how RNA is specifically recognized is of key importance.
The large number of protein factors involved and the dynamic nature of their interactions pose challenges in understanding the molecular mechanisms of mRNA 3 0 -end biogenesis. Until very recently, high-resolution structures were available for only a few of the proteins involved [2,3]. Ysh1/CPSF73 had been identified as the 3 0 -endonuclease and its crystal structure was determined [4 ]. Crystal structures were also available for the poly(A) polymerase Pap1/PAP [5,6]. Together, these structures provided insights into the enzymatic mechanisms but they did not explain how pre-mRNAs are specifically recognized and how the different enzymatic activities are coordinated.
Recent developments in structure determination using electron cryo-microscopy (cryoEM), native mass spectrometry and cross-linking mass spectrometry (XL-MS), combined with next-generation sequencing-based functional methods, have facilitated progress in understanding the molecular mechanisms of pre-mRNA processing. In this review, we outline molecular details of the core machinery that mediates pre-mRNA cleavage and polyadenylation, focusing on insights obtained within the last two years. Structures and mechanisms of other components of the mRNA 3 0end processing machinery have been covered extensively by other reviews [2,3].
Composition and functions of 3 0 -end processing machinery Specific and efficient mRNA 3 0 -end processing is coordinated by the concerted action of a large number of conserved proteins (Table 1). Many components were first identified by cell fractionation studies, both from yeast and human extracts [1], and proteomic studies further defined the components of affinity-purified native complexes from Saccharomyces cerevisiae [7][8][9][10]. The yeast machinery comprises three factors: cleavage and polyadenylation factor (CPF), which contains the enzymatic activities, cleavage factor (CF) IA and CF IB [1,11]. More recently, affinity-purification of an endogenous, intact CPF complex from yeast enabled thorough analysis of its composition, stoichiometry, and subunit assembly [12 ]. Specifically, native mass spectrometry defined the protein-protein interaction network within the complex, revealing that the CPF subunits are assembled into three modules, each based on one of the enzymatic activities: endonuclease, polymerase or phosphatase ( Table 1). The interaction map of yeast CPF subunits enabled-specific cleavage and polyadenylation to be reconstituted from complexes of purified recombinant   Polyadenylation signals and RNA recognition sites of the 3 0 -end processing machinery in yeast and human pre-mRNAs. Protein complexes that are proposed to recognize specific cis-acting elements within the polyadenylation signal are listed below their consensus sequences. Arrows indicate cleavage positions. yeast proteins [12 ,13 ]. In vivo transcriptome-wide mapping of yeast pre-mRNA biogenesis factors previously showed that CPF binds AU-rich elements near the cleavage site [14]. Still, isolated CPF is not substantially active without the accessory factors CF IA and CF IB, which contribute to RNA recognition and activation of the nuclease [3,15]. Both CF IA and CF IB bind-specific RNA sequences near the poly(A) site ( Figure 1) [3,15].
The human mRNA 3 0 -end processing machinery comprises the cleavage and polyadenylation specificity factor (CPSF), cleavage stimulation factor (CstF), cleavage factors Im (CF Im) and IIm (CF IIm), and poly(A) polymerase (PAP) [3]. These factors include many orthologs of the yeast machinery (Table 1), but a fully active human complex has not yet been reconstituted from recombinant proteins. Whether the mammalian CPSF assembles in a similar modular fashion as CPF has not yet been confirmed but it appears to be functionally equivalent to the yeast core CPF complex (CPF core ) comprising the nuclease and polymerase modules [13 ,16 ]. Some of the mammalian components, including WDR33 and Fip1, are much longer than the yeast subunits. Unlike in yeast, human PAP is not a stable component of the complex. A proteomic study of the human pre-mRNA 3 0 -end processing complex identified 85 associated proteins [17 ], but the functions of many of these have not been thoroughly investigated.
Recognition of upstream UGUA-containing sequences (USEs) and downstream G-rich and GU-rich sequences (DSEs) by the CF Im and CstF complexes, respectively, contributes to selection of the cleavage site and ensures efficient pre-mRNA recognition and cleavage [3].

Molecular architecture of the polymerase module
Guided by the protein-protein interaction network map, a 200-kDa recombinant four-subunit Cft1-Pfs2-Yth1-Fip1 complex of the polymerase module was recently analyzed using cryoEM. This resulted in a 3-D reconstruction comprising Cft1, Pfs2, and zinc finger (ZF) domains 1 and 2 of Yth1 [12 ]. Cft1 contains three seven-bladed beta-propeller domains (BP1, BP2, BP3) followed by a C-terminal helical domain, and is intimately associated with Pfs2, a WD40-protein composed of a betapropeller domain and an N-terminal protrusion that inserts into the cavity formed by Cft1 BP1 and BP3 (Figure 2a). The Cft1-Pfs2 interface is extensive, burying >4200 Å 2 surface area, and is highly conserved. The same interaction mode was observed in X-ray crystallographic and cryoEM structures of the orthologous human CPSF160-WDR33 heterodimer [22 ,23 ,24 ]. The triple beta-propeller domain architecture of Cft1/ CPSF160 is structurally homologous to that of DNA Damage Binding protein 1 (DDB1) [25] and splicing factor SF3b subunit Rse1 [26] (SF3b130 in human) despite low sequence conservation (15%). Interestingly, the interaction of DDB1 with its binding partner DDB2 has some similarities to the Cft1-Pfs2 and CPSF160-WDR33 interactions but the details of the subunit contacts are not conserved.
The Cft1-Pfs2 heterodimer constitutes a rigid core platform of the polymerase module which binds an Nterminal extended region and ZF1-2 of Yth1 ( Figure 2a). ZF3-5 of Yth1 and the entirety of Fip1 are not observed in the cryo-EM reconstruction, indicating that they are flexibly tethered to Cft1-Pfs2, at least in the absence of other CPF subunits and/or RNA. Fip1, an intrinsically disordered protein, forms a physical connection between the C-terminal region of Yth1 and Pap1 [12 ,27]. In an analogous manner, the human CPSF160-WDR33 heterodimer forms the structural scaffold of CPSF (Figure 2b), while CPSF30 bridges the interaction between CPSF160-WDR33 and Fip1, based on XL-MS analysis and co-precipitation experiments [22 ]. Consistent with its absence in yeast Yth1, the CPSF30 zinc knuckle (ZK) domain is not required for complex assembly and its function is hitherto unknown [22 ].

Recognition of the polyadenylation signal
Similar to the yeast polymerase module, human CPSF160, WDR33, CPSF30 and Fip1 form a stable heterotetrameric core complex that recognizes the AAUAAA polyadenylation signal motif and recruits PAP [16 ]. The molecular mechanism of PAS RNA recognition by the mammalian 3 0 -end processing machinery was recently revealed by two independently determined cryoEM structures of the human CPSF160-WDR33-CPSF30-Fip1 complex bound to AAUAAAcontaining RNA. The overall architecture of the complex is highly similar to the yeast Cft1-Pfs2-Yth1-Fip1 structure. However, in the presence of the PAS RNA, the CPSF30 ZF3 domain becomes ordered (Figure 2b). PAS recognition is mediated by the CPSF30 ZF2 and ZF3 domains and WDR33 (Figure 2c and d) [23 ,24 ], as indicated by prior studies [16 ,18 ,27]. Although CPSF160 was previously implicated in PAS recognition [28], the RNA is not contacted by CPSF160 directly.
Recognition of the AAUAAA motif by CPSF30 involves bending of the RNA sugar-phosphate backbone into an Sshaped conformation, stabilized by extensive interactions with WDR33, in particular with its N-terminal extension that encircles the RNA. The kinked RNA conformation is further stabilized by an intramolecular Hoogsteen base pair formed between nucleotides U3 and A6 (Figure 2c). The adenosine dinucleotides in positions 1-2 and 4-5 of the AAUAAA motif are bound by the CPSF30 ZF2 and ZF3 domains, respectively (Figure 2d). Each nucleotide base is inserted into a pocket of the respective ZF domain and stacks with a conserved aromatic residue. Nucleotides A1, A4 and A5 are almost invariant in the PAS and each adenine base is recognized by two base-specific hydrogen-bonding interactions with the N1 and N6amino groups. In contrast, the adenine base of the more variable nucleotide A2 interacts with CPSF30 via a single hydrogen-bonding contact. Contrary to the adenosine dinucleotides A1-A2 and A4-A5, the U3-A6 Hoogsteen base-pair is not recognized by base-specific interactions, but is sandwiched between two conserved phenylalanine residues which stabilize the base-pair with p-p stacking interactions. The shape of the hydrophobic pocket formed by WDR33 to accommodate the U3-A6 pair is not compatible with other purinepyrimidine combinations (Figure 2c). Thus, the intricate and specific network of molecular interactions established between CPSF30, WDR33 and all six nucleotide positions of the PAS provides a rationale for the widespread conservation of the AAUAAA motif revealed by transcriptome-wide mapping of mammalian mRNA polyadenylation sites [19][20][21]. In agreement with this, single-base substitutions in the PAS AAUAAA motif can result in a substantial reduction in the RNA binding affinity of the CPSF complex [22 ] and in deficient mRNA processing in human diseases such as a-thalassemia and b-thalassemia [29,30]. The structural insights suggest that noncanonical PAS motifs with one or more base substitutions [21] may function as weak polyadenylation sites: lower affinity of the non-canonical PAS motif for the CPSF complex may result in decreased use of that site. In the context of alternative polyadenylation, the weak affinity of non-canonical PAS motifs is likely compensated for by additional upstream and downstream cis-acting elements that enhance CPSF-binding under specific conditions or in response to specific signals.
In yeast, the polyadenylation signals (also termed positioning elements) are less well defined and typically contain degenerate A-rich motifs that often lack the U3-A6 Hoogsteen base pair nucleotides. In agreement with this, the U3-A6 binding pocket created by the N-terminal region of WDR33 is not conserved in yeast Pfs2. Conversely, the structure and sequence conservation of yeast Yth1 ZF2 and ZF3 domains, which specifically recognize A bases in higher eukaryotes, suggests that recognition of adenosine dinucleotides by Yth1 is conserved. Structural differences in the RNA recognition modes of the yeast and human machineries could explain the observations that yeast CPF core binds to model RNA substrates with much lower affinity than human CPSF [13 ,22 ].

Mechanism of endonucleolytic cleavage
Ysh1/CPSF73, the endonuclease subunit of CPF/CPSF complexes, contains metallo-b-lactamase (MBL) and b-CASP domains. The crystal structure of human CPSF73 revealed an active site at the junction of the two domains containing two coordinated zinc ions [4 ] ( Figure 3a). The geometry of the zinc ions, coordinating a hydroxide ion (the attacking nucleophile) and a sulfate ion (mimicking the scissile phosphate group in the RNA substrate) in the active site, suggests a possible catalytic mechanism (Figure 3a). However, the active site tunnel in this structure is very narrow and cannot accommodate the RNA substrate. In agreement with this, purified human CPSF73 has only weak endonuclease activity in vitro [4 ] and CPSF-dependent pre-mRNA cleavage has not been reconstituted in vitro to date. Together, these observations suggest that conformational activation of CPSF73 is required before pre-RNA cleavage. This may be important to prevent spurious nuclease activity and pre-mRNA misprocessing.
Recent biochemical reconstitution of the yeast pre-mRNA cleavage and polyadenylation machinery showed that the Ysh1 endonuclease subunit is only active when assembled into the 500 kDa, 8-subunit 'CPF core ' complex [13 ]. Specific and efficient RNA cleavage also requires the presence of both CF IA and CF IB. Interestingly, the recombinant CPF core complex cleaved RNA substrates in vitro within a window of 3 nucleotides suggesting that it has positional accuracy, but does not have strict nucleotide specificity.
To gain insight into mechanisms that may prime Ysh1 for activation within CPF core , crystal and cryoEM structures of yeast Ysh1 in complex with the ubiquitin-like (UBL) domain of another CPF subunit Mpe1 (ortholog of human RBBP6) were determined [13 ] (Figure 3b). These structures revealed an interaction interface burying 900 Å 2 of surface area and involving hydrophobic, polar and electrostatic contacts. Two a-helices within the Mpe1-bound Ysh1 MBL domain are shifted when compared to the CPSF73 structure (Figure 3b). This results in a slight opening of the cavity leading to the active site, but additional conformational changes must occur to accommodate the RNA substrate. The relatively high atomic B-factors of the b-CASP domain suggest that it may be mobile, allowing further opening of the active site cleft upon activation. Nevertheless, the precise mechanism underpinning this process and its dependence on other CPF/CPSF subunits and accessory polyadenylation factors still remains poorly understood.

Conclusions and future challenges
Recent integrative structural studies have revealed the molecular architecture of the eukaryotic mRNA 3 0 -end processing machinery. Although these studies provide key insights into the assembly, RNA recognition and enzymatic activities of these factors, a number of questions concerning their molecular functions still remain unanswered (Figure 3c).
A critical aspect of the molecular mechanism of CPF/ CPSF concerns the coupling of its RNA binding, endonuclease and poly(A) polymerase activities. A 3-D reconstruction of yeast CPF core obtained by negative-stain electron microscopy indicates that Cft1, Pfs2 and Yth1 of the polymerase module form a structural scaffold onto which Pap1 and the nuclease module are tethered [13 ]. Conformational dynamics within CPF may serve to accommodate the variable distance between the polyadenylation signals and cleavage sites observed in different pre-mRNAs and might enable remodeling of the complex upon binding CF IA, CF IB and pre-mRNA. It is likely that similar mechanisms exist within the human complex. To understand the molecular basis of nuclease activation, future studies will focus on the structures of CPF/CPSF bound to accessory factors and additional cis-acting RNA elements.
Further studies will also be required to understand how the 3 0 -end processing machinery regulates transcription. It is known that a phosphatase subunit of yeast CPF (Ssu72) dephosphorylates Ser5 of the C-terminal domain (CTD) of Pol II to facilitate the switch from transcription initiation to elongation [31], while Glc7 dephosphorylates Tyr1 to promote transcription termination [32]. Structural studies will define how the phosphatase module interacts with CPF and whether the phosphatases are mechanistically coupled to RNA recognition, cleavage and polyadenylation.
In mammalian cells, up to 70% of genes have more than one PAS, giving rise to alternative mRNA isoforms that may encode different protein isoforms, or contain distinct 3 0 -UTRs conferring different stabilities and translational efficiencies [33]. The CPSF subunits Fip1 and RBBP6, as well as the accessory factors CstF and CF Im, have been implicated in the regulation of PAS selection [34][35][36][37]. Furthermore, interactions between the 3 0 -end processing factors and the pre-mRNA splicing machinery have also been shown to contribute to alternative polyadenylation [33,38]. The mechanistic basis for alternative polyadenylation remains to be determined.
Finally, mRNA poly(A) tails generally reach a defined length of 60 As in yeast [39,40 ] and 250 As in human [41]. Nuclear poly(A) binding proteins (yeast Nab2 and Pab1, and human PABPN1) play roles in regulating the activity and processivity of poly(A) polymerase, thus specifying the poly(A) tail length, but the molecular details of this process are unknown [42]. The recent structural and biochemical studies described here provide a foundation for future investigations that will focus on these aspects of mRNA 3 0 -end biogenesis and their regulation.

Conflict of interest statement
Nothing declared.