|Home | About | Journals | Submit | Contact Us | Français|
Helicases of the superfamily (SF) 1 and 2 are involved in virtually all aspects of RNA and DNA metabolism. SF1 and SF2 helicases share a catalytic core with high structural similarity, but different enzymes even within each superfamily perform a wide spectrum of distinct functions on diverse substrates. To rationalize similarities and differences between these helicases, we outline a classification based on protein families that are characterized by typical sequence, structural and mechanistic features. This classification complements and extends existing SF1 and SF2 helicase categorizations and highlights major structural and functional themes for these proteins. We discuss recent data in the context of this unifying view of SF1 and SF2 helicases.
Helicases use ATP to bind or remodel nucleic acids, nucleic acid-protein complexes, or both [1–4]. All forms of cellular life and many viruses encode helicases, which constitute one of the largest class of enzymes [5,6]. Virtually all biological processes involving DNA or RNA employ one or more helicases [4,7–9]. Defects in their function as well as deregulated expression of these proteins have been linked to numerous diseases including cancers, developmental defects, and neurodegenerative diseases [10,11].
Two distinct types of helicases exist, those forming toroidal, predominantly hexameric structures, and those that do not . Pioneering sequence analysis by Gorbalenya and Koonin and more recent comparative structural and functional analysis by Wigley and co-workers showed that helicases can be classified in several superfamilies (SFs) [3,12]. The toroidal enzymes comprise SFs3 to 6, and the non-ring forming ones comprise SFs1 and 2 .
The catalytic cores of SF1 and SF2 helicases share almost identical folds and extensive structural similarity . Within each superfamily, the level of structural conservation is still higher [3,7]. Notwithstanding, different helicases of each superfamily perform a wide spectrum of functions on diverse substrates ranging from chromosomal DNA over ribosome pre-cursors to small, non-coding RNAs [2,13,14]. Classical helicase function, i.e., ATP-driven duplex unwinding, is not always the primary physiological function of SF1 and SF2 enzymes [2,7,15,16], and even for helicases that unwind duplexes, there are significant mechanistic differences. Unwinding can be based on translocation on the nucleic acid [2–4], but there is also translocation without unwinding [17–20], and unwinding without translocation [21–24].
To rationalize functional differences and structural similarity of SF1 and SF2 helicases, classifications have been proposed based on sequence, structural and mechanistic features [3,12]. However, none of these classifications extended to all members of the SF1 and 2, because gaps in structural and functional information precluded a comprehensive and systematic view that integrated sequence, structure, and mechanism. Some of these knowledge gaps have been narrowed by recent data and there is renewed interest in a systematic and inclusive SF1/SF2 classification. Here we outline such a classification, based on distinct protein families that are characterized by typical sequence, structural and mechanistic features. We briefly describe how this classification highlights major structural and functional themes of SF1 and 2 helicases, and we discuss recent data in the context of this unifying view of these enzymes.
The hallmark of SF1 and SF2 helicases is a conserved helicase core consisting of two similar protein domains that resemble the fold of the recombination protein RecA . This helicase core contains characteristic sequence motifs, based on which the original subdivision of helicases in superfamilies was accomplished . It has long been noted that both SF1 and SF2 encompass defined protein families with distinct sequence, structural, and mechanistic features, such as the DEAD-box and the Swi/Snf families in SF2, or the Pif1-like family in SF1 [20,25,26]. Yet, no comprehensive classification of these families has been reported for either superfamily.
To devise a family classification for the entirety of both superfamilies, we aligned the sequences of the helicase cores of all SF1 and SF2 proteins from human, Saccharomyces cerevisiae, E.coli, and select viral proteins (Suppl. Fig. S1). We then performed phylogenetic analysis of this alignment. Consistent clustering of proteins with similar sequence characteristics was found (Fig. 1, Suppl. Fig. S2). We identified several well defined clusters of different sizes in both superfamilies. Some of these clusters encompass proteins that have long been viewed as family, such as the DEAD-box and the RecQ-like proteins [26,27]. Extrapolating from this concept, we suggest calling clusters with more than three proteins from a single organism (e.g., DEAD-box proteins) a family, and clusters with two proteins from a single organism a group. No specific term appears necessary for proteins that are encoded by a single ORF in a given organism, even though these proteins may be highly conserved throughout evolution (e.g. Suv3). The distinction between groups and families seems useful because further substructure (subfamilies) can exist in a family with 3 or more proteins, but not in a group with two proteins. We suggest an exception for viral SF2 helicases related to NS3/NPH-II, which we propose to view as a group, even though these proteins are usually encoded only by a single ORF per virus. However, these proteins cluster despite originating from very different viruses.
According to this family/group definition, we identified 9 families and one group in the SF2 and three families in the SF1 (Fig. 1). Several proteins (e.g., Suv3) do not fall into a family or group, despite a high level of conservation in eukaryotes or bacteria (Tabs. ST1, ST2). The families were named according to terms already in use (e.g. DEAD-box), or according to a prominent member (e.g., Ski2-like). Our analysis did not include a comprehensive cross-section of viral SF1 and SF2 proteins [9,28] (see Suppl. Fig. S1 for explanation). It is very likely that further viral proteins constitute additional groups, analogous to the NS3/NPH-II group. Moreover, future surveys of SF1 and SF2 proteins from more organisms may reveal additional groups or families. In this sense, the presented family structure should be viewed as minimal and basic, although it is comprehensive for human, S.cerevisiae, and E.coli. (Fig. 1).
The identified families of both SF1 and SF2 correspond remarkably well to the classification based on translocation and unwinding polarity proposed by Wigley and coworkers , (Tab. 1). The family-based system now extends this functional classification by including all SF2 proteins and by adding a sequence dimension. The presented family structure is further consistent with previous phylogenetic analyses of subsets of SF2 proteins [25,29–31], and with the original sequence analysis and classification by Gorbalenya and Koonin .
Proteins within each of the identified families or groups share high sequence conservation and family/group-typical sequence characteristics (Suppl. Figs. S1, S3), as well as distinct structural features, as discussed below. In addition, proteins within a family or group display family/group-typical mechanistic features, including promiscuous NTP usage vs. specificity for adenosine triphosphates, the ability to unwind duplexes, and unwinding polarity (Tab. 1). The notion of family-typical mechanistic features is important, because it might guide specific experimental approaches for the characterization of SF1 or 2 helicases for which only sequence information is available.
One of the most notable SF1 and SF2 helicase signatures are the characteristic sequence motifs, nine of which had been previously identified [7,12]. Closer inspection of the sequences reveals at least 12 characteristic sequence motifs shared by both superfamilies (Fig. 2A). However, not all motifs are present in each family (Suppl. Fig. S3). As noted previously , sequence conservation in the characteristic motifs is high within each family (Suppl. Fig. S1). The level of conservation in most motifs decreases throughout the respective superfamilies, and only limited sequence conservation remains across both superfamilies (Fig. 2B).
The highest level of sequence conservation across both superfamilies is seen in the residues that coordinate binding and hydrolysis of the triphosphate (motifs I, II, VI, Fig. 2). These residues are located in the cleft between the two conserved RecA-like helicase domains (Fig. 2C,D). The spatial arrangement of functionalities presented by these residues is highly conserved in other P-loop NTPases, probably reflecting significant evolutionary constraints in the active site for phosphoester hydrolysis .
The Q-motif, which coordinates the adenine base, is somewhat less conserved across both superfamilies . This motif is absent in the DEAH/RHA and viral DExH proteins (SF2), and these enzymes are not specific for adenosine triphosphates (Suppl. Figs. S1, S2, Tab. 1). Motif IIIa is seen in SF1 proteins (frequently annotated as motif IV) and also appears to be present in some members of the Swi/Snf family (Suppl. Figs. S1, S4). This motif seems to supply a stacking platform (conserved tyrosine) for the adenine base, a functionality that in other SF2 proteins is located on the opposite site of the bound adenine base, preceding the Q-motif (Suppl. Fig. S4).
The motifs primarily implied in the coordination between NTP and nucleic acid binding site (motifs III, Va) are highly conserved within each superfamily, but not across both (Fig. 2B, Suppl. Fig. S3). Mutations in the conserved residues in these motifs usually impair the coupling of NTP hydrolysis to nucleic acid binding and unwinding (e.g., refs. [33–35]), perhaps by affecting the ability to properly align both helicase domains in response to NTP binding, or by failing to coordinate the assembly of the NTP active site in response to nucleic acid binding [35,36]. Sequence differences in motifs III and Va between both superfamilies suggests that communication between NTP and nucleic acid binding site may vary between SF1 and 2. Such potential disparities are perhaps most obvious in motif III, which is followed by a nucleic acid binding site in many SF1 , but not in SF2 helicases. While it is not well understood how exactly NTP and nucleic acid binding sites communicate in either superfamily, overarching themes begin to emerge, such as arrangements resembling the “glutamate switch” in AAA+ proteins .
For some helicases, including PcrA, Rep, UvrD (UvrD/Rep family, [37,39,40]), RecD2 (Pif1-like family, ), Dengue virus NS3 and HCV NS3 (NS3/NPH-II group, [42,43]), and Mss116p (DEAD-box family, ), structural data at various ATP-hydrolysis states have been integrated into models of conformational changes during unwinding and ATP-dependent nucleic acid binding. These models highlight common themes such as domain closure upon ATP binding, but also significant differences, such as the involvement of non-conserved, accessory domains in the unwinding mechanisms. For the UvrD/Rep and Pif1-like SF1 families, motifs Ia and III have been proposed to play particularly important roles in defining translocation polarity .
The motifs contacting nucleic acid (motifs Ia,b,c, IV, IVa, V, Vb) are located on the face opposite the ATP binding site on both helicase domains (Fig. 2A, B). The arrangement of nucleic acid binding motifs in the secondary structure of both RecA-like helicase domains is very similar for both domains (Fig. 2C). Besides binding to the nucleic acid, these motifs are likely to also participate in the communication between nucleic acid and NTP binding sites, as recently shown for DEAD-box proteins . The nucleic acid binding motifs are well conserved within the protein families (Suppl. Fig. S3), but much less across both superfamilies (Fig. 2B). Only two residues, two Ts (occasionally substituted by S) in motif Ic and V, are widely conserved in both superfamilies (Fig. 2B). The two conserved T residues have been assigned critical roles in unwinding mechanisms of helicases that separate duplexes by translocation [46,47]. However, the two Ts are absent in Rad3/XPD proteins (Suppl. Fig. S3), which unwind duplexes with 5′ to 3′ polarity (Tab. 1).
Contacts to the nucleic acid by residues of the conserved sequence motifs are primarily made to the phosphate-sugar backbone. Additional base contacts, which are critical for unwinding by several helicases, are usually established by residues that are either located in the helicase core outside of the conserved motifs, or in different domains altogether [41,43]. Several contacts to the nucleic acid involve only the peptide backbone, explaining the occasionally low sequence conservation of the nucleic acid binding motifs (e.g., motif IVa, Fig. 2A).
Interestingly, some families encompass both RNA and DNA helicases, other families are comprised solely of DNA helicases (Tab. 1), and only the DEAD-box family appears to contain exclusively RNA helicases. Notwithstanding, it has been shown that even some DEAD-box proteins can bind DNA, although the binding cannot be modulated by ATP in the way RNA binding is modulated . Several helicases, including viral proteins of the NS3/NPH-II group and Upf1-like proteins have been shown to work on both DNA and RNA [48–50]. The absence of a clear correlation between the helicase families and specificity for RNA or DNA suggests that discrimination between RNA and DNA may not have been a predominant evolutionary force for the differentiation of the families. Mechanistic features of proteins from the respective families may be utilized in both RNA and DNA-related processes, and DNA and RNA specificity may have developed after the families were established. Which sequences or structural features dictate specificity for DNA or RNA remains to be elucidated.
In addition to the family-typical sequence domains, some helicase families harbor characteristic structural features. For example, structures of Ski2-like, DEAH/RHA and NS3/NPH-II proteins revealed a prominent β-hairpin between motifs Va and VI that is not present in the other SF2 families [51–53] (Suppl. Fig. S5). Structural and mechanistic studies of the Ski2-like Hel308 helicase indicate that this hairpin functions as a “pin” to separate duplex strands at the junction . It is assumed that this “pin” plays similar roles in viral DExH and DEAH/RHA proteins , and it is intriguing to note that the presence of this β-hairpin correlates with polar 3′ to 5′ unwinding activity (Tab. 1). Moreover, this β-hairpin might be functionally similar to the strand separating “pin” in SF1 helicases , and perhaps to a potential “pin” in RecQ proteins . These pins, however, are localized at different positions in the RecA-fold, or outside the helicase core (Suppl. Fig. S5).
A further, generally family-typical structural feature is the presence of occasionally large inserts within the helicase core domains. Inserts are seen in all SF1 families [37,41,56,57] and in the Rad3/XPD family [58–60]. The Swi/Snf and the RIG-I-like families contain inserts between the two helicase domains [61–63] (Fig. 3). The position of these inserts is usually conserved within a family. The inserts adopt independent folds, which in most cases have only small or virtually no effects on the RecA-fold of the helicase core domains. Where tested, the inserted domains were found to impact the function of these proteins, both in the cell and in vitro (e.g. refs. [54,64]). A few individual proteins in other helicase families also feature inserts, such as the DEAD-box protein DDX1  and the RecG-like PriA , but these inserts are not typical for the families.
In most SF1 and SF2 helicases the helicase core is surrounded by C and N-terminal domains, which often exceed the helicase core in size. In many cases, these domains adopt defined folds with specific functions such as nucleases, RNA or DNA binding domains (e.g., Zn-fingers, OB-folds, dsRBDs [53,67–69]), or domains engaged in protein-protein interactions (e.g., CARD-domains ) (Fig. 3). In many proteins, multiple defined C- and N-terminal domains are seen (e.g., Dicer  Fig. 3). It is clear that these regions influence or even define the function of a helicase, not only through additional enzymatic activities, such as nuclease function, but also, for example, by promoting oligomerization . In addition, C- and N-terminal domains are thought to be critical for the physiological specificity of helicases. Terminal domains have been demonstrated to direct recruitment to certain complexes, to promote interactions with other proteins, or to facilitate recognition of specific nucleic acid regions [70,73–76]. Consistent with critical roles in establishing physiological specificity for individual enzymes, C- and N-terminal accessory domains are usually not conserved within a family. I.e., a family-typical helicase core is surrounded by variable accessory domains. Notwithstanding, recent studies have revealed some degree of structural conservation of the C-terminal domains in the Ski2-like and the DEAH/RHA families [52,53] (Fig. 3).
Structures for members of all SF1 and SF2 families and groups have been reported, and a broad view of the domain architecture of the different families is now possible (Fig. 4). It is important to note that architectures within a given family might differ, depending on the various C and N-terminal domains. It is not currently possible to systematically assess such intra-family variability, because too few structures are available for helicases within each given family that encompass the helicase core and significant or all parts of C- and N-terminal domains. Nonetheless, it is intriguing to compare the available information. Most helicases that unwind duplexes with defined polarity, either 3′ to 5′, or 5′ to 3′ (Rad3/XPD, Ski2-like, DEAH/RHA, NS3/NPH-II, all SF1 families) have functionally important accessory domains located on top of the nucleic acid binding site on the helicase core (Fig. 4). This arrangement encloses the bound nucleic acid strand to some extent, possibly facilitating directional translocation upon which polar unwinding is based for many helicases [3,4]. Polar unwinding has also been shown for some RecQ-like helicases  and for RIG-I , but structures of proteins from these families do not feature domains on top of the nucleic acid binding site (Fig. 4E,B). However, currently available structures do not represent full-length proteins, and it remains to be seen whether these families ultimately have domains located on top of the helicase domain, or if these proteins accomplish polar unwinding through a distinct architecture, or by other means, such as oligomerization .
Helicases that do not unwind duplexes (Swi/Snf, T1R) do not have domains on top of the nucleic acid binding site (Fig. 4C,D). This architecture is consistent with the notion that these enzymes need to bind duplex DNA [18,20,80]. The RIG-I-like helicase Hef also does not feature a domain on top of its RNA binding site in the helicase core, although RIG-I-like proteins unwind duplexes in a polar fashion [78,81] (Tab. 1). However, RIG-I and related proteins also act on duplex RNA [19,70,82,83]. Available structures for DEAD-box proteins also do not show domains on top of the RNA binding site on the helicase core [44,84,85] (Fig. 4A). This arrangement is consistent with the observation that the distinct unwinding mechanism of these enzymes involves direct binding to the duplex region [21,23,24,86,87]. However, conclusive inferences for both, the RIG-I-like and the DEAD-box families cannot be made until full-length structures for proteins with large C- or N-terminal domains are available.
Significant progress has been made in recent years towards understanding structure-function relations of SF1 and SF2 helicases. It has become clear that despite a highly conserved core fold, these enzymes perform diverse functions on nucleic acids both in vitro and in the cell. The presented family-based classification may be a helpful step towards rationalizing common themes as well as differences between individual helicases.
Notwithstanding the impressive progress, much remains to be learned. For helicases where sophisticated structural models explain complex reactions, such as unwinding, focus for further inquiry may be on elucidating the communication between NTP and nucleic acid binding sites. For other helicases, it is critical to obtain structural data of full-length proteins, bound to nucleic acids and preferably to ATP analogs representing different states of the ATPase cycle. Structural information needs to be complemented by mechanistic studies addressing how the different states of the ATPase cycle are coupled to conformational work on nucleic acid substrates.
In addition to investigating connections between structural and mechanistic aspects of helicase functions, it is most critical to elucidate how helicase mechanisms are manifested in physiological contexts. Although not discussed in this review, enormous progress has been made over the last years in structural and functional analysis of helicases bound to other proteins and in authentic functional complexes. Examining helicase function in complexes amenable to biochemical and structural analysis is likely to come into even greater focus over the next years.
For eukaryotic RNA helicases, deeper understanding of physiological substrates and contexts remained elusive for many years, but significant hurdles have now been taken. Studies of specific systems such as the DEAH/RHA helicase Prp22 , and the DEAD-box protein eIF4A-III , as well as the application of novel, genome wide survey techniques  have started to reveal the interaction of RNA helicases with their physiological substrates on a physical level. Integration of biological studies with structural and mechanistic analyses of RNA helicases will be bound to illuminate many aspects of post-transcriptional regulation of gene expression.
We apologize to all colleagues whose work could not be discussed or cited here, due to space limitations. Among the cited articles, too many contributions are of particular significance, and we therefore refrained from highlighting selected references. We thank Dr. Patrick Linder (Geneva) for the many fruitful discussions that were most instrumental in shaping our view on SF1 and SF2 helicases. Research in the author’s laboratory is supported by the Burroughs Wellcome Fund and the NIH (GM067700). UPG is supported by a DFG postdoctoral fellowship (GU-1146).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.