Oligomerization of Biomacromolecules – Example of RNA Binding Sm/LSm Proteins

The central dogma of molecular biology stated that genetic information only flows in one direction, from DNA to proteins via an intermediate called messenger ribonucleic acid (mRNA) [1,2,3]. Originally, ribonucleic acid (RNA) was thought to have roles in information transfer and structure maintenance. Today, we know that RNA performs a remarkable range of functions in the living cell, (control of gene expression, chromosome –end maintenance, housekeeping activities, sorting of proteins in the cell and defines metazoan development) [3].Although, proteins have enzymatic activities mostly, in the early 1980s has been shown that RNA molecules can catalyze a chemical reaction and RNAs with catalytic activity are called ribozymes. The discovery of ribozymes led to the hypothesis that RNA could have been the original molecule of life on earth about four billion years ago; a biopolymer with the ability to self-replicate and that could both store information and catalyze chemical reactions. RNA would have been self-sufficient as the original molecule of life [4]. Discovery of the unexpect‐ edly wide variety of functions carried out by RNA was accompanied by the identification of a multitude of further types of small, non-coding RNAs (small nuclear RNA, small nucleolar RNAs, small interfering RNAs, micro RNAs) highlighting the versatility of RNA as a bio‐ chemical tool for the cell [2].


Introduction 1.A multitude of RNAs
The central dogma of molecular biology stated that genetic information only flows in one direction, from DNA to proteins via an intermediate called messenger ribonucleic acid (mRNA) [1,2,3].Originally, ribonucleic acid (RNA) was thought to have roles in information transfer and structure maintenance.Today, we know that RNA performs a remarkable range of functions in the living cell, (control of gene expression, chromosome -end maintenance, housekeeping activities, sorting of proteins in the cell and defines metazoan development) [3].Although, proteins have enzymatic activities mostly, in the early 1980s has been shown that RNA molecules can catalyze a chemical reaction and RNAs with catalytic activity are called ribozymes.The discovery of ribozymes led to the hypothesis that RNA could have been the original molecule of life on earth about four billion years ago; a biopolymer with the ability to self-replicate and that could both store information and catalyze chemical reactions.RNA would have been self-sufficient as the original molecule of life [4].Discovery of the unexpectedly wide variety of functions carried out by RNA was accompanied by the identification of a multitude of further types of small, non-coding RNAs (small nuclear RNA, small nucleolar RNAs, small interfering RNAs, micro RNAs) highlighting the versatility of RNA as a biochemical tool for the cell [2].rRNAs represent structural and catalytic elements of the ribosome.In the nucleolus of eukaryotic cells, more than 100 tandemly repeated units of rRNA genes are transcribed into long precursor transcripts [7,8].Following transcription, pre-rRNA is subsequently cleaved to form mature rRNAs and with approximately 80 proteins to form the large and small ribosomal subunits prior to their export to the cytoplasm.SnoRNAs (Small nucleolar RNAs) participate in both the modification and cleavage events that occur during ribosome biogenesis [5].A second group of small RNAs are the tRNAs, which are essential in translation [8,10].Micro RNAs are non-coding RNAs of 22-24 nucleotides in length.They down regulate gene expression by attaching themselves to mRNA, thereby preventing them from being translated into protein.Another type of non-coding RNA is the small interfering RNA.These small RNAs mediate life time of RNA by interacting with mRNAs and labeling it for destruction [2,4].PiRNA is the large class of small non-coding RNAs which acts with piwi proteins.These piRNA have been linked to both epigenetic and posttranslational gene silencing of retrotransposons and other genetic elements in germline cells [11,12].Telomerase is complex of proteins and RNA,and is responsible for maintaining of natural end of chromosomes.Telomerase acts as reverse transcriptase because its mechanism of action is copying RNA template into DNA [10].
Small nuclear RNAs are components of the macromolecular machinery (spliceosome) that has a role in the maturation of mRNA.They are termed U snRNAs (stands for uridyl rich small nuclear RNA).U1, U2, U4, U5, U7, U11, and U12 are synthesized in the nucleus by RNA polymerase II.After that they are transported to the cytoplasm where association with the U snRNP proteins (proteins which associate with uridine rich small nuclear RNAs, generating UsnRNP, uridyl rich small nuclear ribonuclear particles) occurs, followed by re-import into the nucleus [1,13].U6 and U8 snRNA belong as well to class of small nuclear RNA but their synthesis, biogenesis and function differs from mentioned UsnRNAs.U3 snRNA shares common denomination but as well this small RNA is found in the nucleolus, and has role in pre-rRNA processing and has C/D box motif, which technically make it a member of the C/D class of snoRNAs [14].

Structure of RNA and association with proteins
DNA and RNA have similar covalent structures, the only difference being the change from a 2`-deoxyribose sugar to a ribose sugar and from a methyl group in thymine to a hydrogen in uracil.RNA has much wider biological activities and adopts a wider range of structures.DNA double helices preferentially assume the B-form structure in solution and RNA double helices are found in the A-form.The RNA A-form double helix has a narrow and deep major groove, which prevents proteins to recognize RNA in a manner analogous to the way they recognize DNA.An RNA molecule can locally adopt several types of secondary structure (bulges, hairpins, internal loops) [15,16].Eukaryotic mRNAs are almost always associated with RNAbinding proteins.RNA-binding proteins generally have a modular structure and contain RNAbinding domains of 70-150 amino acids that mediate RNA recognition.Three major classes of eukaryotic RNA-binding protein domains are known: the RNA-recognition motif (RRM), the double stranded RNA binding domain (dsRBD) and the K-homology (KH) domain [17].

All mRNA processing steps are coupled
Eukaryotic gene expression is a complex, stepwise process that begins with transcription (synthesis of pre-mRNA) [18].Mature mRNAs are produced in the cell nucleus from primary transcripts of coding genes (pre-mRNAs) by a series of processing events which include capping, splicing, and 3` end polyadenylation.Mature mRNAs are transported to the cytoplasm.All modification steps are coupled and influence each other.RNA polymerase II is a key molecular coordinator of these processing events, and phosphorylation of it has regulatory role [19, 20,21,].

Removal of introns and the splicing reaction
In 1977, a number of research groups discovered that the genes of higher organisms are often made up of a sequence of coding (called exons) and non-coding base sequences (introns).During transcription, all parts of the gene are copied to form a strand of pre-mRNA.Introns are removed and the exons stitched together so that the now continuous exons can be translated to produce a protein.This splicing of the pre-mRNA is a multistage process, carried out by complex macromolecular machinery known as the spliceosome, which is among the most complex macromolecular machineries in the cell [22].
Splicing of precursors to mRNAs occurs in two steps, both involving a single transesterification reaction [23].Assembly and function of the spliceosome requires approximately 300 polypeptides and five snRNAs, not considering gene-specific RNA-binding factors [23].There are two distinct types of spliceosome in most cells.The major class U2-type spliceosome is universal in eukaryotes, whereas the minor class or U12-type spliceosome is not present in some organisms.The evolutionary relation between these two spliceosomes is uncertain.

Types of introns
The pre-mRNA contains conserved elements at its intron/ exon boundaries that determine the proper sites for the splicing reaction (Figure 3.).The 5' splice site contains a conserved consensus sequence, which is AG/GURAGU (R=purine, / denotes the exon/ intron boundary).The branch site lies between 100 and 18 bases upstream of the 3' splice site and has the consensus: CURAY for vertebrates (A branching nucleotide, Y is pyrimidine).In higher eukaryotes, a polypyrimidine tract variable in length is often located between the branch site and the 3' splice site.The 3' splice site has the consensus:YAG/R for mammals.This class of introns is spliced by U2 spliceosome.The U12 type introns have different consensus sequences and are spliced by the U12 spliceosome [24].The number of known U12 introns is still very small.U12 type introns are present in many vertebrates, nematodes, insect, and plant species.

Small nuclear ribonucleoproteins, snRNPs
Small ribonucleoproteins (RNPs) are tight complexes of one or more proteins with a short RNA molecule (usually 60-300 nucleotides).RNPs inhabit nuclear and cytoplasmatical compartments of the eukaryotic cell [25].Those that reside in the nucleus, the small nuclear ribonucleoproteins (snRNPs) can themselves be divided into two families.There are snRNPs of the nucleoplasm, whose function lies in preparing messenger RNA for export into the cytoplasm.A different set of snRNPs, called snoRNPs, reside in the nucleolus [26].The nucleolus is a structure in the nucleus composed of proteins and nucleic acids.Its function is to transcribe ribosomal RNA (rRNA) and combine it with proteins to form ribosomes.There are about 200 distinct kinds of snRNPs (they differ according to the RNA or protein components) with abundances between 10 4 (for snoRNPs) to over 10 6 copies per cell (for snRNPs in major spliceosomes).They generally play a role in gene expression.One exception is the telomerase snRNP, essential for genome maintenance, which is present only in a few copies per cell.Spliceosomes are formed around the pre-mRNA substrate by the successive assembly of five small nuclear ribonucleoprotein-particles (snRNPs): U1, U2, U4, U5, and U6.These particles are composed each of a small nuclear RNA (snRNA), seven Sm core proteins common to all snRNPs (except for the U6 snRNP, which contains a related set of seven proteins, the Sm-like proteins) and several snRNP-specific proteins.The snRNPs play a central role in the process of splicing.They are responsible for the recognition of splice sites, definition of exon/ intron boundaries.These interactions are partially mediated through base pairing and are dynamic so that the spliceosome complex changes during the process of splicing.

U snRNP biogenesis
Subsequent to transcription by RNA polymerase II and capping, pre-U1 snRNA assembles with several factors including cap-binding proteins (CBP), a phosphorylated adaptor for RNA export (PHAX), Crm1, and Ran-GTP, which all together mediate export of U1 snRNA to the cytoplasm.After export, Sm proteins interact with the U snRNAs to form the snRNP Sm core.This step is facilitated by the SMN complex (survival of motor neurons complex).The SMN complex is composed from SMN protein and the other proteins called Gemins (Gemins 2-7).Nuclear re-import is mediated by snurportin-1 (SPN1), which binds to the snRNAs m3G cap structure.After import, these factors dissociate.The U1 specific proteins are imported independently into the nucleus, where assembly into mature U1 snRNP occurs [27].This is a pathway shared with U1, U2, U4 and U5 snRNPs.

Spliceosome assembly
Assembly of a spliceosome for excision of an intron requires recognition of sequences at the 5' splice site as well as the branch site and nearby 3' splice site.U1 snRNA binds to the 5` end of the intron using sequence complementarities.There are reports which show that the U1 snRNA recognizes the 5` splice site in a preassembled penta-snRNP complex [28].U2 snRNP complex associates with the branch region.Early snRNP/pre-mRNA complexes are preferentially committed to splicing as compared to free RNA and thus are called commitment complexes (CCs).The process of U2 snRNP association is ATP-dependent and four proteins are critical for recognition [29].Subsequent to the binding of U2 snRNP complex, a tri-snRNP complex containing U4/U6 snRNP and U5 snRNP associates in an ATP-dependent manner to form complex A2-1.It is likely that the U1 snRNA/pre-mRNA duplex dissociates at this stage.The 5' splice site sequence is probably paired on the intron side to U6 snRNA and on the exon side to U5 snRNA.The transition between complex A2-1 and A1 requires destabilization of at least U4/U6 di-snRNA.As only three snRNAs U2, U6 and U5 are associated with the spliceosome at the moment of catalysis, and as U5 snRNA pairing with exon sequences is not essential, the catalytic site is either created by U6 snRNA by U2 snRNA or both.The action of certain proteins is required for the transition to the second step in splicing.The catalytic site for the second step is created by either U6 snRNA, U2 snRNA, or associated proteins.The reannealing of released U4 and U6 snRNP and association with U5 forms the U4/U6 U5 tri-snRNP complex which is then ready to reassemble on another commitment complex.The classical view of spliceosome assembly has been challenged by Stevens et al [30].This group isolated from yeast a penta-snRNP complex which when supplied with soluble components, does splice pre-mRNA.

mRNA stabilization, degradation
Regulation of mRNA decay rates is an important control point in determining the abundance of cellular transcripts.Some mRNA has half-lives that are 100 times shorter than cellular generation times and some mRNA have half-lives spanning several cell cycles [21].The poly (A) tail is important in stabilization of mRNA.It interacts with the poly (A) binding protein (PABP), which makes direct contact with a specific region of the translation-initiation factor (eIF4E).Translation initiation factor (eIF4) interacts with the cap binding proteins.In this way, a ternary (PABP-translation initiation,-cap binding protein, poly (A) tail) complex is formed which circularizes mRNA in vitro, promoting translation and stabilization of mRNAs [21].Several sequence elements can regulate the mRNA turnover rate, either by its promotion (destabilizer elements) or by its inhibition (stabilizer elements).Important elements are A+Urich elements (ARE), located in the 3` untranslated regions (UTR) of mRNAs [31].At least four different ways of mRNA degradation have been reported in eukaryotic cells [32].In most cases, degradation of the transcript begins with the shortening of the poly (A) tail at the 3` end of mRNA.After shortening of the poly (A) tail follows the removal of the 5` cap structure (decapping), thereby exposing the transcript to digestion by a 5` to 3` exonuclease.Family of LSm proteins is involved in degradation of mRNAs, in the deccaping step Transcripts can be degraded in the 3`-5` direction after deadenylation.This process is catalyzed by the exosome [33].One mRNA degradation pathway is the nonsense mediated decay (NMD), which provides strongest evidence for a link between translation and turnover [34].

Sm proteins, assembly of U1, U2, U4, U5 snRNPs
The Sm proteins were first discovered as antigens targeted by so-called Anti-Sm antibodies in a patient with a form of Systemic lupus erythematosus (SLE), a debilitating autoimmune disease.They were named Sm proteins in honor of Stephanie Smith, a patient who suffered from SLE.Other proteins with very similar structures were subsequently discovered and named LSm proteins.The common proteins for U1, U2, U4 and U5 snRNPs are named Sm proteins due to their recognition by anti-Sm autoantibodies (isolated from the serum of patients with autoimmune diseases [35,36].Eight proteins: B`, B, D1-D3, E, F and G have been characterized in human cells.All of the Sm core proteins are encoded by separate genes [37], with the exception of B and B`.The B and B` that result from an alternative splicing of gene 6628 located on chromosome 20, locus 20p13, only differs in 11 amino acids at the C-terminus [38].In neural tissues, SmN replaces SmB and SmB` [39].Two sequence motifs, named Sm1 and Sm2, are found in all known Sm proteins, what is reason that they are called Sm proteins [40].The N terminal Sm1 motif is composed of 32 amino acids.The Sm2 motif, located closer to the C terminus, is shorter spanning only 14 amino acids [35].Sm motif 1 and Sm motif 2 are separated by a linker of variable length.The alignment of the sequences of human Sm proteins reveals a striking conservation of the two motifs.Majority of the Sm proteins have amino or carboxy-terminal extensions.Solved structures of this protein family members (pdb codes: 1d3b,1b34,1hk9, 1h64,1i8f,1i4k, 1kq1,3bw1,1th7) show that the fold is highly conserved.It is defined by an N-terminal helix, followed by a five-stranded anti-parallel β sheet.Strands β1, β2, and β3 are part of the Sm1 motif, whereas the Sm2 motif forms strands β4 and β5.The five stranded β sheet is strongly bent in the middle and the conserved hydrophobic residues form a hydrophobic core [41].
The Sm proteins bind to the Sm site of U snRNAs [42].The Sm site consensus sequence (PuAU4-6GPu) has a central, uridine rich tract and flanking purines.In vitro studies reveal that the single-stranded U rich region and the 5` adenosine of the Sm site play critical roles in Sm protein assembly.The uridine bases and the 2` hydroxyl groups collectively provide binding determinants [43].In the absence of U snRNA, the seven Sm proteins form three stable subcomplexes (D3B, or D3B', D1D2, and EFG).These sub complexes then form a heptameric ring around the snRNA Sm site, and as such the complex is termed the Sm core.SnRNP core assembly is an ordered pathway that involves formation of a sub-core particle followed by formation of the full Sm core, which promotes cap hypermethylation and pre-snRNP import [44].The Sm fold is necessary and sufficient for the formation of specific inter-subunit interactions.Biochemical results indicate that there is one copy of each Sm protein in the snRNP core domain and therefore support the heptameric ring model of the snRNP core domain [45].None of the single Sm proteins has a known RNA recognition motif, so another type of interactions with RNA must be involved.Crosslinking studies indicate that Sm motif 1 is responsible for interactions with RNA, and Sm motif 2 for protein -protein interactions [43].
Basic residues of human and yeast SmB, SmD1 and SmD3 are reported to be responsible for import of the Sm core particle [45].In vitro, the snRNP core domain can be assembled from purified components [46].Assembly of the spliceosomal class of snRNPs in vivo is an active process that is mediated by several factors, including the product of the SMN gene (survival of motor neurons gene).Mutations of SMN gene are responsible for spinal muscular atrophy disease (SMA).Spinal muscular atrophy is an autosomal recessive disorder correlated with loss of motor neurons, as a result of a mutation on the SMN gene [47].The SMN protein is ubiquitously expressed in all tissues of metazoan organisms reflecting the fact that it provides a fundamental activity required by all cells.The SMN protein is predominantly cytosolic but it is also found in the nucleus, namely in a few spherical nuclear domains that overlap with the so-called Cajal bodies (where snRNPs and snoRNPs are localized).These spherical domains have been called Gemini of Cajal bodies (Gems).Proteins associated with the SMN protein are called Gemins.The SMN complex interacts in vitro with Sm and LSm proteins which contain symmetrically methylated RG (arginine -glycine) repeats [48].Symmetrically methylated RG repeats of SmD1, SmD3 and LSm4 are generated by action of the so-called methylosome [48,49].The SMN complex binds to the human hypermethylase which suggests that SMN may have a role in formation of the snRNA m 3 G cap structure.It has been proposed that after binding of SMN to the Sm core proteins, SMN promotes engagement of TGS1 to the m 7 G-capped snRNP particle.According to that model, SMN dissociates from the C terminal part of the B/B` Sm proteins followed by association of TGS1 and Sm core.This step allows formation of the m 3 G-cap.The association of Snurportin 1 with the m 3 G-cap can promote release of TGS1 and generate import-competent snRNP [50].According to these data, the SMN complex interacts with protein components of U snRNPs, but there are reports [51] on sequence-specific interactions between U1 snRNA and the SMN complex.The nuclear localization signal of U snRNP is composed of the U snRNAs 2,2,7 tri-methylguanosine cap and Sm core domain.Snurportin 1 binds to the m 3 G cap but not to the Sm core.Snurportin1 has an N-terminal importin beta binding domain and a carboxy terminal m 3 G-cap binding domain [52].The Importin beta binding domain allows for snRNPs cargo to be imported in a Ran independent fashion.After import of snRNPs into the nucleus, Snurportin1 dissociates from its cargo and is exported back into the cytoplasm using Crm1, a receptor for leucine-rich nuclear export signals [53].The SMN complex not only mediates snRNP core assembly but is an integral complex component during the entire snRNP core biogenesis in the cytoplasm.It is not excluded that SMN is actually the long-sought Sm core nuclear localization signal receptor [54].

LSm proteins
Sm and Sm-like proteins are found in all kingdoms of life: eukarya, archaea and bacteria.These proteins were found even in Archaea.Because Archaea have been proposed to be related to the ancestor of the eukaryotic nuclear genome, this fact suggests that an LSm protein gene was present in the last common ancestor.Archaebacteria harbour between one and two genes wich encode for Sm motif containing proteins.The in vivo functions of archaeal Sm proteins remain unknown (in constrast to the eukaryotic and bacterial homologs, and fact that high resolution structure from archaeal systems is known (pdb code 1ljo) [55].LSm proteins have been identified in plants as well [56].Eukaryotic genomes have more than 20 Sm/LSm genes each, corresponding to the LSm and Sm proteins which are components of Sm and LSm complexes.Database searches in the yeast genome, revealed 16 Sm motif containing proteins.Some Sm-like proteins were found to interact weakly with some Sm proteins, most probably via non-specific Sm domain interactions [57], but some of the LSm proteins interact with Sm proteins as part of U7 snRNP.In yeast there are nine LSm proteins, in humans more than eight.Each of the human LSm proteins has one orthologue in yeast.Yeast LSm2p-LSm7p share sequence identity with human LSm2-LSm7 ranging from 41-62%.LSm9p appears to be present only in yeast.Yeast LSm8p aligns best with human LSm8 (26% identity).In addition, LSm proteins are highly conserved throughout all eukaryotic kingdoms, as the homologues in insect, nematode and plant database share between 50 and 75 % identity with their human counterparts.Each of the LSm proteins in humans can clearly be best aligned with one of the canonical Sm proteins.However, their sequence identities are not high enough to allow the conclusion that LSm proteins undergo the same protein-protein interactions like Sm proteins [58].Similar to canonical Sm proteins, the LSm proteins are recognized by antibodies from patients suffering from systemic lupus erythematosus (SLE) [59].Sm/LSm proteins always appear as homomeric (in the case of prokaryotes) or heteromeric (in eukaryotes) ringlike multimers.These ring-shaped complexes, generally containing either six or seven subunits, are the functional LSm protein unit.All canonical Sm proteins are essential for vegetative growth of yeast.LSm proteins have variable effects after depletion in yeast.In mice embryos, LSm4-null zygotes survived to the blastocyst stage, but died shortly after [60].

Role of LSm 2-8 oligomers in U6snRNP assembly
The LSm2-8 complex was isolated from Hela cells nuclear extract in an RNA free form.Electron micrographs revealed a doughnut-shaped heterooligomer, similar to the Sm core snRNPs [58].LSm proteins have a high affinity for single-stranded oligo-U, but they do not recognize the canonical Sm binding site.In yeast and humans, LSm2-8 forms a heteroheptameric ring around the 3` end of U6 snRNA, consisting of a U rich tract.The Sm core RNP is extremely salt stable; however, LSm-U6 snRNA dissociates at salt concentrations higher than 0.5M, or in the presence of competitor RNA, suggesting that the LSm-U6 complex is less stable [58].U6 snRNA has no conserved Sm site and does not associate with Sm proteins.Its biogenesis pathway differs in many respects from the U1, U2, U4 and U5snRNP pathways; it is transcribed by RNA polymerase III and capped by γ-monomethyltriphosphate.The 3` end of pre-U6 snRNA is elongated during maturation and subsequently trimmed leaving in most organisms a 2`-3c yclic phosphate.The enzymes involved in this process are specific for U6 snRNA, and U6 snRNA does not leave the nucleus [61].Mature U6 snRNA shows nucleoplasmic localization [62].Experimental evidence suggests that U6 snRNA is present in the cytoplasmic compartment of mouse fibroblast cells [63].This result suggests that the LSm2-8 complex may act as a nuclear localization signal, but the cytoplasmic localization of the U6 snRNP is highly questionable.The actual function of the LSm 2-8 complex associated with U6 snRNA appears to be connected to U6 snRNP assembly and function.Mutants with decreased levels of LSm2-8 show splicing defects correlating with a reduced level of U6 snRNA.How the LSm2-8 complex affects U6 snRNP remains unclear.One possibility is that LSm proteins facilitate conformational rearrangements during the splicing cycle, U4/U6 annealing and formation of U4/U6/U5 tri-snRNP [64].

Role of LSm proteins in protecting mRNA 3` end termini from degradation
LSm proteins have additional roles apart from splicing.Yeast strains which lack LSm1-7p fail to grow at higher temperatures, and accumulate mRNA shortened at the 3` end by 20-30 nucleotides.The simplest model proposes that LSm1-7 complex binds to the mRNA and sterically inhibits endo and exo-nucleases.Nuclear LSm2-8 binds to the U6 snRNA 3` end, suggesting, that LSm2-8 protects the 3` end of U6 snRNA from degradation [65].

Role of LSm oligomer proteins in U8 snoRNP organization
U8 snoRNP is required for processing of 5.8S and 28S rRNAs, which together with the 5S rRNA build up the large ribosomal subunit.In Xenopus extract, LSm2, 3, 4, 6, 7, and 8 are bound as hetero hexamer to U8 snoRNA on the conserved third stem-loop sequences [66].

LSm protein oligomers in mRNA degradation
Yeast two hybrid assays reveal multiple interactions between the eight LSm proteins, suggesting the existence of more than one LSm protein complex.Each human LSm protein is capable of interacting with multiple other LSm proteins and splicing factors, like prp24, prp4, and SmD1 [69].Coprecipitation experiments demonstrated that LSm1p (LSmXp, is the nomenclature for yeast LSm proteins) together with LSm2p-LSm7p forms a new sevensubunit complex [70,71].The LSm complex LSm1-7 plays a role in mRNA degradation [72], and LSm2-8 has a role in the stabilization of U6 snRNP.These two protein complexes thus have very different functions.LSm1p mutants accumulate full length capped transcripts, but mutations on LSm1p do not stabilize mRNA containing premature stop codons, suggesting that the LSm1-7 complex is not involved in NMD [71].The function of the LSm1-7 complex is most likely to interact with the mRNA substrate and accelerate decapping.Decapping is mediated by a decapping enzyme that is consisting of Dcp1a, Dcp1b, and the catalytic subunit Dcp2.The LSm1-7 proteins are localized in discrete cytoplasmic foci.The foci contain key decapping factors required for 5`-3` mRNA degradation.Coexpression of LSm proteins increases the number of foci.The cytoplasmic foci contain LSm1-7 proteins [73].LSm1 and LSm8 are closely related to each other, and to the SmB protein.The 33 C terminal amino acids of LSm1 are necessary but not sufficient for proper cellular localization of hLSm1 [73].Finally it has been demonstrated [73] that the foci are actual degradation centers, where mRNA degradation occurs.This suggests that the cytoplasm of cells is more organized than previously thought.Bacterial Hfq protein (pdb 1hk9) is able to chaperone RNA-RNA interactions similarly like LSm proteins ability to chaperone RNA/protein interactions and protect the 3' end of a transcript from exonucleolytic decay while encouraging degradation through other pathways [74].

LSm proteins in the processing of pre-tRNAs
It has been reported that depletion of LSm proteins in yeast leads to strong accumulation of unspliced tRNA species.The absence of LSm proteins most probably alters the pattern of processing intermediate [75].

LSm 2-7 complex associated with snR5
An LSm2-7 hexameric complex is found to be associated with snR5 in Saccharomyces cerevisiae.This RNA is a member of the class of snoRNAs that function in pseudouridylation of rRNA.The SnR5 associated LSm complex may be a hexamer, but it is not excluded that one LSm protein in this complex is present in more than one copy, or an as yet unidentified yeast protein associates with LSm2-7, thereby closing the heptameric ring [76].LSm2-8 interacts with external 3` sequences on U6 snRNA.The LSm1-7 complex interacts with 3` UTR mRNA but there could well be secondary structure elements between the LSm1-7 binding site and the mRNA 3' end.U8snoRNA and snR5 bind LSm proteins via internal RNA sequences, suggesting that LSm rings can assemble onto the RNA.LSm proteins have a role in the biogenesis and function of at least a subset of nucleolar RNAs.One possibility is that LSm proteins assist base pairing of snoRNAs with their rRNA targets, to conduct pseudouridylation and ribose modifications.A particularly interesting example of forming higher order complexes-oligomers is the Sm/LSm protein family (whose various complexes are described above), whose members are engaged in a variety of RNA processing events, forming complexes which differ sometimes only by one out of seven subunits.Another important aspect of the Sm/LSm protein family is that these proteins never occur in isolation; for proper functioning they require complex formation.Hence, the way to better understand Sm/LSm protein function is to study Sm/LSm complexes.It is difficult to determine the connection between the oligomeric state of a given protein and its function in vivo.Reconstitution in vitro of two human LSm complexes with seven subunits each, LSm1-7 and LSm 2-8, has been described [77,78,79].The LSm2-8 complex binds to the 3'-end of U6 snRNA in the cell nucleus.The closely related cytoplasmic LSm1-7complex binds to the 3'UTR of mRNAs destined for degradation.Remarkably, LSm1-7 differs from LSm2-8 only by the exchange of one single subunit, LSm1 for LSm8.
Sequence comparisons of the yeast LSm protein family indicate that each canonical Sm protein has a corresponding LSm protein with the exception of SmB, which aligns almost equally well with LSm1 and LSm8.Based on sequence comparisons co expression vectors encoding the homologs of SmD1D2, LSm23, of SmD3B, LSm48, and of SmEFG, LSm567 were constructed and proteins were expressed in bacteria [77].LSm4 and LSm1 were singly over expressed for the reconstitution of LSm1-7 [77].
Two heteroheptameric complexes LSm1-7 and LSm2-8 were reconstituted from two heterodimers and one heterotrimer in case of LSm2-8 (LSm2-3, LSm4-8, LSm5-6-7) and one heterotrimer, one heterodimer and two proteins singly expressed (LSm2-3, LSm5-6-7, LSm1, LSm8).Reconstitution of heteroheptamers was achieved by mixing of equimolar amounts of each appropriate protein at 37°C adding 4 M urea in order to disrupt higher order structures, because those proteins have tendency to oligomerize.After incubation, mixture of pure recombinant proteins was dialyzed against native buffer.Mixture was applied on to size exclusion chromatography, followed by the anion exchange chromatography.Last step in purification of homogenous heteroheptamers was size exclusion chromatography (peak profile shown on figure 11., and respective fractions were analyzed on polyacrylamide gel (shown of figure 12).Negative stain electron micrographs show that reconstituted LSm2-8 has a ring-like architecture with a diameter of about 8 nm.The overall dimensions are similar to those previously observed for the native LSm2-8 complex isolated from HeLa cell nuclear extract (8 nm) and core snRNP domain [58].The central cavity observed for the recombinant LSm2-8 complex is larger than in the native LSm2-8 complexes (3 vs. 2 nm, respectively).The LSm1-7 rings appear to be slightly smaller, measuring ~ 7 nm across with a pore diameter of less than 1.5 nm.Thus, recombinant LSm1-7 and LSm2-8 complexes are similar to one another and to the native Sm/LSm complexes at this level.In all LSm co-crystal structures solved with RNA oligonucleotides, the RNA molecules mainly wrap around the rim of the pore.
One of the methods which can be used for the identification and characterization of the RNA binding proteins is the electrophoretic mobility shift assay (EMSA).The basis of this method is the change in the electrophoretic mobility of a nucleic acid molecule upon binding to a protein or another molecule.Initially a labeled RNA, which contains the binding sequence, is incubated with a sample containing the RNA binding proteins and the mixture is then analyzed on a non-denaturing gel.The unbound RNA will have a characteristic electrophoretic mobility.Functionality of reconstituted LSm2-8 and LSm1-7 complexes has been demonstrated using this essay in vitro [77].That oligomer complexes are functional in vivo has been shown [77], by injecting fluorescently labeled complexes into cytoplasm of living cells.They localized in expected cellular compartment, namely LSm 2-8 took nuclear localization and LSm1-7 complex remained in the cytoplasm.The structure-function relationships within the Sm/LSm protein family reflect three major interconnected features which illustrate why it is so important to solve the structures of Sm/LSm hetero-oligomeric complexes: First, Sm/LSm protein function is in general strictly dependent on complex formation.This holds for RNA binding, Sm/LSm-protein containing RNP biogenesis, interaction with non-Sm protein effector proteins, and RNA processing activity.The required interaction interfaces are apparently always three dimensional structural sites generated from several Sm/LSm subunits.High resolution Second, exchange of only one or two subunits from one to another heterooligomeric (mostly heptameric) Sm/LSm complex changes its whole biology (see above).How such subtle structural changes can have these very large functional effects can only be addressed by solving the crystal structures of the respective complexes.Lastly, the ability of individual Sm/LSm proteins to assemble with different homologous binding partners to form architecturally very similar, yet functionally diverse complexes argues for a very fine balance between flexibility and specificity for the respective Sm-Sm interactions.Clearly, in order to understand the "molecular recognition code" governing the specificity balance mentioned above, more structural information on such interactions is indispensable.Recently crystal structure of Saccharomyces cerevisiae LSm2-8 complex bound to U6 snRNA had been determined (pdb code 4M7D) [80].

Figure 2 .
Figure 2. The two chemical steps of splicing (Reprinted from reference 23)

Figure 5 .
Figure 5.A simplified view of the splicing process (Reprinted from reference 5)

Figure 6 .
Figure 6.Primary and secondary structure of Sm proteins (Reprinted from reference 41).Amino acid sequence alignment of the human Sm (D1, D2, D3, B/B`, E, F, and G) proteins with secondary structure elements.Wavy line, helix; arrows, β strands.The β strands within the Sm1 and Sm2 motifs are colored blue and yellow, respectively.The β strands and interconnecting loops are numbered consecutively from the N terminus.The conserved Sm1 and Sm2 motifs are indicated and the conserved residues within these motifs are highlighted in blue (hydrophobic), grey (hydrophobic, less well conserved), orange (neutral polar), red (basic) and green (acidic).(Reprinted from reference 41)

Figure 7 .
Figure 7. Proposed Higher-Order Assembly of the Human Core snRNP Proteins.The seven core Sm proteins (B/B`, D1, D2, D3, E, F, and G) are arranged within the seven-membered ring based on the crystal structures of the D1D2(1b34) and D3B (d3b)complexes and pairwise interactions deduced from biochemical and genetic experiments.(Reprinted from reference 41)

Figure 8 .
Figure 8. Schematic model of the role of SMN in snRNP core biogenesis in the cytoplasm.(Reprinted from reference 54)

Figure 9 .
Figure 9. Structural Alignment of Human Sm/LSm proteins

Figure 11 .
Figure 11.Second size exclusion chromatography step