|Home | About | Journals | Submit | Contact Us | Français|
The bacterial Hfq protein is a versatile modulator of RNA function and is particularly important for regulation mediated by small non-coding RNAs. Hfq is a bacterial Sm protein but bears more similarity to the eukaryotic Sm-like (Lsm) family of proteins than the prototypical Sm proteins. Hfq and Lsm proteins share the ability to chaperone RNA-RNA and RNA/protein interactions and an interesting penchant for protecting the 3′ end of a transcript from exonucleolytic decay while encouraging degradation through other pathways. Our view of Lsm function in eukaryotes has historically been informed by studies of Hfq structure and function but mutational analyses and structural studies of Lsm sub-complexes have given important insights as well. Here, we aim to compare and contrast the roles of these evolutionarily related complexes and to highlight areas for future investigation.
Sm proteins exist in all three domains of life and are defined by the ability to adopt the Sm fold, which is comprised of an N-terminal α-helix and five anti-parallel β-strands (Fig. 1).1 While bacteria generally have just one Sm family member known as Hfq, eukaryotes have evolved over 20 of these proteins.2 The eukaryotic proteins can be divided into Sm and Sm-like (Lsm) sub-families and form a variety of multimeric complexes (Table 1). Seven canonical Sm proteins are the founding members of the family and were discovered as the antigens targeted by auto-antibodies in a patient who suffered from systemic lupus erythematosus.3 Seven Sm proteins (predominantly SmB/B’, SmD1, SmD2, SmD3, SmE, SmF and SmG in vertebrates) are assembled as a heptameric ring around a conserved core sequence at the center of the spliceosomal snRNAs.4 Both Hfq and the Lsm proteins can also adopt multimeric ring structures. Hfq forms a hexamer in solution;5 while Lsm proteins typically form heteromers or homomers of six or seven subunits.6 We note though that other oligomeric states of Lsm proteins have been observed, including octamers.7 Complexes containing both Sm and Lsm proteins have also been found, such as the heptamer comprising SmD3, SmB, Lsm10, Lsm11, SmF, SmE and SmG, which assembles on the U7 snRNP involved in vertebrate histone mRNA processing.8 Notably, some of the more recently described Lsm proteins do not appear to form multimers and have other protein domains in addition to the Sm fold. These include EDC3,9 Ataxin210 and RAP55/LSM14,11 all of which have been implicated in mRNA decay and/or translation. In addition, two other proteins with Sm folds, Gemin6 and Gemin7, form a heterodimer and are part of the SMN complex that chaperones association of the Sm proteins with the U snRNAs.12
Importantly, Hfq and the canonical eukaryotic Lsm complexes (Lsm1-7 and Lsm2-8) differ fundamentally from the seven spliceosomal Sm proteins in that they spontaneously form stable multimeric rings in the absence of RNA.13 In contrast, Sm complexes assemble only in the presence of RNA in a process that, at least in vertebrates, requires considerable chaperoning from other protein factors.14 In the interests of brevity, we will be focusing mainly on the multimeric complexes formed by Lsm and Hfq proteins in this review.
The cellular functions of eukaryotic Lsm proteins and their bacterial counterparts are intimately connected with RNA processing and degradation. In E. coli, the Hfq protein binds small non-coding RNAs (sRNAs) to facilitate sRNA/mRNA interactions that, in turn, modulate translation efficiency and mRNA decay.15 In addition, Hfq can also influence mRNA decay by associating with the 3′ end of the transcript and promoting polyadenylation while also inhibiting 3′-5′ exonuclease activities.16,17 These activities are important for normal stress response and pathogenicity in E.coli and other bacterial pathogens.18
Eukaryotic Lsm complexes have similar functions to those ascribed to Hfq in bacteria in that they chaperone mRNAs and non-coding RNAs through various steps in metabolism.19 In eukaryotes ranging from S. cerevisiae to humans there are multiple complexes of Lsm proteins, but the two best-characterized are Lsm1-7 and Lsm2-8, which are restricted to the cytoplasm and nucleus, respectively.20-24 Lsm1-7 is a major regulator of mRNA decay through its interactions with the decapping machinery,19,22,23 whereas Lsm2-8 is involved primarily in stabilizing the U6 snRNA and chaperoning it through the splicing process.21,25-27 Other minor Lsm complexes include Lsm2-7, which associates with the snR5 snoRNA in yeast28 and also with pre-RNase P20 while Lsm 2-4/6-8 binds the U8 snoRNA.29
At first glance, it seems that both the structure (the Sm fold) and functions (in RNA decay and processing) of Lsm proteins are conserved from bacteria through to eukaryotes. However, as we delve deeper into the structure and RNA-binding activities of these factors, several differences will become evident. During evolution, the Lsm family has shown a high propensity to adapt to different substrates and mechanisms.
Although Hfq and Lsm proteins are both able to recognize internal RNA sequence elements, in both cases, their interactions with 3′ ends of RNA substrates are best-characterized and arguably most important. Both complexes share affinity for 3′ uridine or adenosine tracts.13,30,31 Such elements can be template encoded, like the U-rich tracts which are formed at the 3′ end of bacterial sRNAs as a result of Rho-independent termination.32 Alternatively, such sequences can be added post-transcriptionally by nucleotidyltransferases such as TUTases and poly(A) polymerases (PAPs).33 Regardless of their provenance, such single-stranded 3′ elements are important determinants of RNA stability as they are vulnerable to 3′ exonucleolytic attack. Hfq and Lsm proteins share the ability to protect these 3′ ends but they also use them as a platform to orchestrate recruitment of other RNA elements and proteins.
In bacteria, just as in higher organisms, there are multiple pathways to induce decay of mRNAs.16 One of the best studied mechanisms in both eukaryotes and prokaryotes involves a 3′ poly(A) tail. In bacterial cells, polyadenylation mediated by PAP enhances mRNA degradation as it provides a toehold for exonucleases such as polynucleotide phosphorylase (PNPase).17,34 PAP initially acts distributively, adding only a few adenosines; however, Hfq associates avidly with these short poly(A) tails and stimulates PAP such that it becomes more processive (Fig. 2A).35-37 The long poly(A) tails that result from this mechanism remain bound to Hfq and are protected from PNPase action.
In eukaryotic cells, the poly(A) tail is added co-transcriptionally and expedites many required steps of mRNA metabolism such as splicing, export and translation. However, it is also intimately involved in mRNA decay. Deadenylation, mediated by dedicated deadenylases such as CCR4/NOT or PARN, is the first rate-limiting step in decay of most mRNAs and generates mRNAs with short poly(A) tails of 10 or less residues, much like those generated by PAP during decay of bacterial mRNAs.38 These oligoadenylated mRNAs are specifically recognized by the Lsm1-7 complex (Fig. 2B).30 Although the function of Lsm in mRNA decay is not fully understood, there are two clear consequences to Lsm1-7 binding. First, the 3′ end of the mRNA is protected from exonucleases; yeast Lsm1 mutants accumulate mRNAs with 3′ ends that are trimmed by 20–30 nt.39 Second, Lsm1-7 recruits factors required for decapping at the 5′ end of the transcript; loss of Lsm1-7 function results in accumulation of deadenylated intermediates in yeast and mRNAs are stabilized.40 There are several parallels with Hfq function that can be highlighted here. The most obvious of these is that both Hfq and Lsm1-7 associate with the 3′ ends of oligoadenylated mRNAs that are destined for degradation. In addition, both complexes protect the 3′ end from exoribonucleases; Hfq blocks PNPase while Lsm prevents attack by the exosome (which interestingly contains six subunits with homology to PNPase41). There is also at least one surprising difference: Hfq promotes extension of the mRNA poly(A) tail by PAP and, in fact, associates more intensely with longer poly(A).37 In contrast, Lsm1-7 has a distinct preference for oligoadenylated mRNAs over those with longer poly(A) tails.30
Many bacterial sRNAs end in 3′ oligo(U) tracts of seven to nine residues which are acquired through Rho-independent termination. These regions provide a platform for recruitment of Hfq to stabilize sRNAs against 3′ exonucleolytic decay. Notably, the U-tracts appear to be precisely sized to favor Hfq binding as an sRNA with a 3′-U6 tract shows significantly reduced association and one with just four 3′ uridines fails to bind Hfq at all.42 Importantly, Hfq binding is also essential for annealing of the sRNA to its mRNA target, most commonly at sites within the 5′UTR. Annealing of the sRNA can modulate translation initiation by occluding or exposing the ribosome-binding site on the mRNA.43 In addition, an sRNA/mRNA/Hfq complex can be recognized by the endonuclease RNase E resulting in rapid decay of both the mRNA and sRNA (Fig. 2C).44,45 In this case, Hfq protects the sRNA 3′ end in order to facilitate its decay through an alternative pathway. This is surprisingly reminiscent of the role of Lsm1-7 in mRNA degradation; it associates with 3′ oligoadenylated mRNAs and protects them from 3′-5′ decay while simultaneously promoting decapping and decay through the 5′-3′ pathway (Fig. 2B).39
Short 3′ uridine tracts are important binding sites for eukaryotic Lsm proteins as well. For example, the nuclear Lsm2-8 complex binds a U5 tract at the 3′ end of U6 snRNA and influences stability of the transcript.13,46 As Lsm2-8 is essential for almost all functions of U6 in the splicing reaction (Fig. 3), the U-tract also represents a vital platform for Lsm recruitment. Interestingly, the length of the U-tract appears to be important in Lsm binding. Nascent U6 snRNA has just four uridines at its 3′ end, which allow association with the La protein.47,48 The 3′ U-tract is subsequently extended by a TUTase49 and then trimmed by an exonuclease known as MPN1.50,51 During processing, Lsm displaces La and protects U6 against excessive 3′ shortening.52 It is possible that cycles of oligouridylation and trimming serve to ensure U6 snRNA is maintained intact following each splicing event.
Recent studies have uncovered novel roles for Lsm1-7 and oligouridylation in cytoplasmic mRNA decay pathways. The vast majority of cytoplasmic S. cerevisiae mRNAs degrade primarily through the deadenylation-dependent pathway described above, and this pathway is also conserved in higher organisms.38 However, there is compelling evidence for the existence of additional decay mechanisms in other fungi and in plants. Moreover, these alternate pathways also involve Lsm1-7 interactions at the 3′ end. In S. pombe, a significant fraction of mRNAs decay through a deadenylation-independent pathway that involves the action of a terminal uridyl transferase known as Cid1.53 Cid1 is responsible for addition of one or two non-templated uridines at the 3′ end of the poly(A) tail of certain transcripts including act1, urg1 and others. Interestingly, this terminal modification induces decapping without prior deadenylation, almost certainly through the recruitment of Lsm1-7 and its associated decapping factors. A similar pathway has also been uncovered in the filamentous fungus Aspergillus nidulans, where mRNAs obtain a 3′CUCU modification following shortening of the poly(A) tail to around 15 adenosine residues.54 This modification is performed by two nucleotidyltransferases known as CutA and CutB, both of which are related to the S.pombe Cid1 enzyme.54,55 Although Lsm proteins are essentially uncharacterized in Aspergillus, it seems likely that they will be involved in this decay mechanism. Similar events have also been detected in Arabidopsis where one or two pyrimidines, generally uridines, can be found at the 3′ end of oligoadenylated mRNAs.55 It still remains unclear whether this decay pathway for polyadenylated mRNAs is conserved in mammals or other eukaryotes but, as described below, a very similar mechanism is employed for degradation of the non-adenylated histone mRNAs in mammalian cells.
In higher eukaryotes, the replication-dependent histones lack poly(A) tails and instead undergo unique 3′ end processing that results in a stem-loop structure at their 3′ end. This stem-loop associates with a specific set of factors, including the U7 snRNP and stem loop binding protein (SLBP) among others.56 These factors work together to ensure that histone production is tightly restricted to S phase and facilitate the rapid degradation of histone mRNAs as S phase comes to a close, or when DNA replication is inhibited. Importantly, 3′ oligouridylation is the initiating step in turnover of histone mRNA species57 and is mediated by the ZCCHC11 TUTase58 (which is again related to Cid1). This 3′ toehold allows Lsm1, most likely as part of the Lsm1-7 complex, to be recruited to the 3′ end of histone transcripts.57 Binding of Lsm1 induces histone mRNA turnover by decapping and 5′-3′degradation,57-59 and also enhances 3′-5′ decay by recruiting an exonuclease known as Eri1 (Fig. 2D).60 We note that although histone mRNAs are polyadenylated in yeast, Lsm1 still is essential for their controlled turnover as Lsm1 mutants have defects in genomic stability caused by excess accumulation of histone mRNAs.61
At this point we should note that a large number of eukaryotic non-coding RNAs, such as miRNAs and piRNAs, also experience post-transcriptional 3′ uridylation and/or adenylation.62,63 It is too early to say what role, if any, Lsm complexes play in the metabolism of these extended small RNAs but it is conceivable that Lsm modulates stability or even annealing of these regulatory RNAs with their targets much as Hfq influences bacterial sRNA functions.
Several studies have investigated exactly how Hfq and Lsm complexes interact with RNA, particularly with A-rich and U-rich oligonucleotides. A preponderance of our current knowledge derives from structural studies on Hfq from the bacteria S.aureus,64,65 E.coli,66-68 S. typhimurium,69 P. aeruginosa,70,71 B. subtilis72 and the cyanobacteria Anabaena and Synechocystis,73 but additional insights have been gleaned from mutational analyses of Lsm1,74 structural studies using Lsm sub-complexes75,76 and binding assays performed with recombinant Lsm complexes.77,78 One over-arching similarity between Lsm and Hfq complexes is that both recognize oligo(U) through a surface that lies close to the central pore on the proximal side of the toroid. For Hfq, conserved residues important for RNA-binding lie within Loop 3 and Loop 5 of each subunit and form a propeller-like arrangement that comprises six individual binding sites, each of which can interact with a single uridine (Fig. 1 and and44).64,69 The residues required for RNA-binding in Lsm1 were predicted based on the structure of Hfq and subsequent mutational analysis supported their requirement for both mRNA decay and 3′ end protection.74 There is no high-resolution structure available for either Lsm1-7 or Lsm2-8 complexes. In an Lsm3 octamer, the putative RNA-binding residues are arranged close to the central pore, as in Hfq (Fig. 4).76 Despite the very similar overall structures of Hfq and Lsm complexes, the primary sequence conservation is insufficient to allow prediction of the exact contacts between RNA and each Lsm subunit.
While both Hfq and Lsm proteins show clear affinity for the 3′ end of RNA, S.typhimurium Hfq has a unique interaction with the 3′ end that allows it to distinguish and protect nascent RNAs, which bear a 3′ hydroxyl group, while ignoring decay intermediates that terminate with a 3′ phosphate.69 This interaction relies on a hydrogen bond between the histidine in Loop 5 and the 3′OH. The histidine residue is conserved in other bacteria, but Loop 5 is very different in Lsm proteins (Fig. 4), suggesting that they may have distinct interactions with the 3′ end. Indeed, while Hfq strongly prefers a 3′OH over a 2’,3′ cyclic phosphate (3′>P),69 the Lsm2-8 complex demonstrates a clear preference for 3′>P over 3′OH.52 This correlates well with the fact that 3′>P is the predominant terminal moiety of the mature U6 snRNA that is complexed with Lsm2-8 in vertebrates.52
There is quite a dramatic divergence between Hfq and Lsm in their mechanisms for binding to oligo(A). Hfq associates with oligo(A) and other A-rich sequence elements using a binding surface on the distal face.65,67,68,79 This surface varies in charge distribution between Hfq complexes from different bacteria and there is no evidence to date to suggest that its function is conserved in eukaryotes. Lsm appears to utilize the same proximal binding site to recognize 3′ oligo(A) as it does to bind 3′ oligo(U). Notably, in contrast to Hfq, which prefers long poly(A), the Lsm1-7 complex has evolved significant specificity for short A tails and binds less well to longer tails.30,78 Residues within the proximal RNA-binding pocket of Lsm1 confer this specificity. This observation is very consistent with the fact that the two complexes recognize oligo(A) through completely different regions.
Both Hfq and Lsm complexes certainly have additional means of interacting with RNA as the ability to simultaneously contact RNA in multiple places, or to simultaneously bind multiple RNAs, is implicit for their chaperone function. A third binding site with generally positive charge has been detected on the lateral surface of E. coli Hfq, but this charge distribution is not well-conserved, even in other bacteria.80 The C-terminal extension has also been implicated in RNA-binding; C-terminally deleted E. coli Hfq fails to associate with mRNAs81 and the C-terminal region of the P.aerophilum SmAP3 protein bears some structural resemblance to domains that associate with single-stranded nucleic acids.82 Indeed, the C-terminal regions of this family vary widely in both length and composition, so it remains very possible that they contribute to RNA-binding in eukaryotic cells, either through direct interaction with RNA, or by influencing access of the RNA to other binding sites. In support of this, yeast Lsm1 mutants lacking from 28‒55 amino acids of the C-terminal region have reduced affinity for RNA, although they still maintain the ability to distinguish short and long poly(A) tails.77
To this point we have focused predominantly on the interactions of Lsm and Hfq proteins with RNA, but it is equally important to discuss the central role of these complexes in chaperoning RNA-RNA and RNA-protein interactions.27,83,84 As mentioned above, Hfq has many RNA-binding surfaces that allow it to interface with multiple transcripts simultaneously. This property is crucial in the course of sRNA/mRNA annealing. Recent studies support that in addition to binding the sRNA 3′ end through the proximal binding site, Hfq interacts with additional sRNA sequences through its lateral-binding surface.80 These other interactions likely involve the internal stem-loop and short U-rich region that lie between the 3′ oligo(U) tract and the region that base-pairs with the mRNA.85 Recruitment of an mRNA target to the distal-binding site then results in rearrangements that favor annealing of the mRNA to the sRNA. Annealing can lead to translation inhibition and decay of both the mRNA and sRNA through RNase E cleavage (Fig. 2C),44,45 but translation can also be enhanced in some cases.86
Although the binding surfaces on Lsm proteins are less well-characterized, the interactions of Lsm2-8 with U6 snRNA may well involve multiple surfaces as well as other RNAs. First, although Lsm2-8 is clearly initially engaged through binding to the 3′ U tract, it can also be cross-linked to other regions of the U6 snRNA, suggesting that it may bind other U6 sequences as well.87 Second, the splicing process itself involves several changes in U6 snRNA conformation (Fig. 3): U6 snRNP is a transient entity which is readily incorporated into the U4/U6 di-snRNP and then the U4/U6.U5 tri-snRNP, before entering the spliceosome where U4 is ejected. Within the spliceosome, U6 base-pairs with U2 as well as interacting with the pre-mRNA but the Lsm complex is dismissed prior to the first step of splicing. U6 is released following completion of the splicing event.88 Lsm is known to enhance U4/U6 interactions, but it seems likely that it may also be involved in the subsequent remodeling events that occur as the U6 snRNA unwinds to interact with U2. It would be interesting to examine whether novel Lsm-binding surfaces are implicated in the remodeling of U6 snRNAs.
As well as binding RNA, both Hfq and Lsm complexes interact directly with other proteins. Of note, both Hfq and Lsm recruit decay/processing enzymes to their substrates. Following binding to sRNAs and annealing of the sRNA to its mRNA target, Hfq recruits the endonuclease RNase E to cleave and initiate decay of both mRNA and sRNA.44,45 There appears to be a direct interaction between RNase E and Hfq in this case,89,90 although the region of Hfq that binds RNase E has not been defined. In addition, Hfq is reported to co-purify with both PAP and PNPase, which presumably assists Hfq-mediated regulation of polyadenylation-induced mRNA decay.35
The cytoplasmic Lsm1-7 complex associates with multiple proteins required for mRNA decapping. Most notably, in S. cerevisiae, Pat1p readily co-purifies with Lsm1-7.91 Pat proteins are essentially scaffold factors that enable recruitment of other factors needed for decapping.92 In particular, Pat1 binds the decapping enzyme and enhances its activity.93,94 Pat1 recruitment also inhibits translation.92 A major portion of Lsm function in mRNA decay is likely mediated through its interaction with Pat1 and it has been suggested that Pat1 is required for association of Lsm with mRNAs.93 The region(s) of Lsm1-7 that interact with Pat1 have not been defined to date.
The U7 snRNP required for vertebrate histone mRNA 3′ end processing contains two Lsm proteins, Lsm10 and Lsm11, as well as five Sm proteins.8 Interestingly, the ~170 amino-acid N-terminal extension of Lsm11 plays a vital role in recruiting other factors to the U7 snRNP and eventually to histone pre-mRNAs. The N-terminal region of Lsm11 interacts directly with a large protein called FLASH to form a platform that recruits mRNA 3′ end processing factors including the endonuclease CPSF73.95
Hfq also appears to rely on protein-protein interactions for its assembly. Specifically, the RelA protein encourages assembly of Hfq monomers into the hexameric form. In E. coli RelA mutants, active Hfq becomes limiting and many of its activities in sRNA-mediated regulation are lost.96 It is not clear whether there are similar factors that promote assembly of the canonical Lsm1-7 and Lsm2-8 heptamers in higher organisms. We do note though that the Sm complex employs a number of factors including the SMN complex and PRMT5 to favor its association into snRNPs.14 Indeed, many of the same factors are involved in assembly of the U7 snRNP, which contains a mixture of Sm and Lsm factors.8 Moreover, SMN itself has been reported to associate with Lsm4 and Lsm6 in vitro.97
Over the last few years, structural studies of Hfq bound to RNA have brought invaluable insights into the mechanisms by which this RNA chaperone functions. Although such experiments are much more challenging in eukaryotes due to the heteromeric nature of Lsm complexes, such studies are essential to allow better definition of the interactions between Lsm and its RNA substrates. There are numerous interesting questions on the horizon for Lsm proteins and the complexes that they generate in eukaryotic cells. Given the explosion in the numbers of non-coding RNA regulators, the roles of Lsm complexes in the biogenesis and function of these transcripts should be investigated. Furthermore, unlike Hfq, which forms homomeric complexes, all Lsm proteins are not created equal; each has a unique C-terminal region as well as significant divergence within the Sm domain. It is already clear that Lsm1 and Lsm8 are unique to the cytoplasm and nucleus, respectively, and have distinct roles, but the individual contributions of Lsm2-7 remain to be elucidated. Hints as to the properties of individual Lsm proteins are currently available through recent structural studies on subunits and intersubunit interactions.75,76 These data should stimulate the development of testable hypotheses for the functions of individual Lsm subunits. Finally, the observation that both Hfq and Lsm proteins have been usurped by evolutionarily diverse viruses to promote viral gene expression and replication98 underscores the versatility of these protein complexes in RNA biology. Studies of these interactions will undoubtedly give further insights into the cellular roles of Lsm proteins.
We are grateful to the anonymous reviewers for their diligent reading of the manuscript and constructive criticism. Research into mechanisms of mRNA decay in the Wilusz laboratories is supported by R01 awards from the National Institutes of Health (NIAID-GM072481 to J.W. and NIAMS-AR059247 to C.J.W.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We apologize to those whose work we were unable to cite due to lack of space.
No potential conflicts of interest were disclosed.
Previously published online: www.landesbioscience.com/journals/rnabiology/article/23695