|Home | About | Journals | Submit | Contact Us | Français|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact firstname.lastname@example.org
The Snf2 family of helicase-related proteins includes the catalytic subunits of ATP-dependent chromatin remodelling complexes found in all eukaryotes. These act to regulate the structure and dynamic properties of chromatin and so influence a broad range of nuclear processes. We have exploited progress in genome sequencing to assemble a comprehensive catalogue of over 1300 Snf2 family members. Multiple sequence alignment of the helicase-related regions enables 24 distinct subfamilies to be identified, a considerable expansion over earlier surveys. Where information is known, there is a good correlation between biological or biochemical function and these assignments, suggesting Snf2 family motor domains are tuned for specific tasks. Scanning of complete genomes reveals all eukaryotes contain members of multiple subfamilies, whereas they are less common and not ubiquitous in eubacteria or archaea. The large sample of Snf2 proteins enables additional distinguishing conserved sequence blocks within the helicase-like motor to be identified. The establishment of a phylogeny for Snf2 proteins provides an opportunity to make informed assignments of function, and the identification of conserved motifs provides a framework for understanding the mechanisms by which these proteins function.
Some 15 years ago Gorbalenya and Koonin (1,2) identified a large group of proteins sharing a series of short ordered motifs. The majority of members with known function were nucleic acid strand separating helicases so the sequences became known as helicase motifs and were labelled sequentially I, Ia, II, III, IV, V and VI. A number of additional conserved blocks with broad distributions within these helicase-like proteins have subsequently been identified, such as the TxGx (3) and Q motifs (4).
Proteins containing the helicase motifs are subdivided into several superfamilies on the basis of similarity. Structural characterizations have revealed that helicase-like superfamilies 1 and 2 (SF1 and SF2) are related with a common core of two recA-like domains (5). The helicase-like enzymes link ATP hydrolysis to a directed change in the relative orientation of these domains (6). Structural and mutagenesis studies have shown that each of the conserved motifs in the active site cleft between the recA-like domains plays a role in the transformation of chemical energy from ATP hydrolysis to mechanical motion. This enzymatic process has been suggested to represent one application of a more general mechanism used in many proteins containing a recA-like domain (7).
Proteins with a helicase-like region of similar primary sequence to Saccharomyces cerevisiae Snf2p comprise the Snf2 family within SF2 (Figure 1A). Indeed, Snf2p was specifically aligned within SF2 by Gorbalenya and Koonin (1). Many of the first identified Snf2 family members were ATPases within chromatin remodelling complexes and it was recognized that the presence of a core polypeptide related to Snf2p is a defining property of ATP-dependent chromatin remodelling (8). It is now apparent that the Snf2 family comprises a large group of ATP-hydrolysing proteins that are ubiquitous in eukaryotes, but also present in eubacteria and archaea.
At least a subset of Snf2 family proteins act as ATP-dependent DNA translocases (9–12). Some of these proteins have also been found to be capable of generating unconstrained superhelical torsion in DNA (11,13–17), proposed to occur as a result of the translocation of DNA into constrained loops. This is substantiated by recent analysis of the action of RSC on single DNA molecules (18). In addition to distorting DNA, the ATP-dependent action of these proteins can disrupt chromatin as measured using a range of different assays (8), although other DNA–protein interactions can also be affected. For example, Rad54 promotes Rad51-dependent strand pairing (13), and Mot1 displaces the TATA-binding protein (TBP) from DNA (19). Thus although many Snf2 family proteins are likely to act to alter chromatin structure, this is not the case for all members of the family.
Early biochemical studies and sequence alignments suggested that members of the Snf2 family could be further subdivided into a number of subfamilies (20). These subfamilies have traditionally taken the name of the archetypal member, such as S.cerevisiae Snf2p (Snf2 subfamily), Drosophila melanogaster Iswi (Iswi subfamily), Mus musculus Chd1 (Chd subfamily) and S.cerevisiae Rad54p (Rad54 subfamily). Snf2p, therefore, lends its name to both the collective Snf2 family and a specific Snf2 subfamily (Figure 1A).
The only comprehensive analysis of the Snf2 family sequences to date was performed by Eisen et al. in 1995 (20). Although it has subsequently been revisited within the context of various biochemical studies (21–24), no broad survey has been conducted for over a decade.
To gain new insights into the Snf2 family, we have catalogued Snf2 family members by scanning for proteins containing spans with similarity in sequence over the helicase-like region, classifying them into subfamilies, analysing the distribution of these subfamilies in complete genomes, and mapping the common sequence characteristics onto the newly available three-dimensional structures. We have identified 24 distinct subfamilies, 11 with near ubiquitous representation in eukaryotic genomes. Many of these subfamilies correlate with known biological function, but there remain a significant number for which little information is currently available. The abundance of Snf2 family members in eukaryotes in comparison to archaea and eubacteria points to their diversification early in eukaryote radiation. This diversity and the currently known functional linkages suggest the Snf2 family helicase-like region is specifically adapted to perform distinct functions within different subfamilies. Underlying this, analysis of the conserved blocks of residues reveals a common core of structural features likely to be fundamental to the mechanism of the Snf2 family motors.
Swissprot/Uniprot (25) release 42 and Uniref100 (26) release 5 were downloaded from the European Bioinformatics Institute (ftp://ftp.ebi.ac.uk/pub/databases/swissprot/ and ftp://ftp.ebi.ac.uk/pub/databases/uniprot/, respectively). The sources and version details for the predicted protein complements of the 54 eukaryotic, 24 archaeal and 269 prokaryotic organisms surveyed are available at the webserver http://www.snf2.net/. Analyses were performed on a cluster of dual Pentium III microcomputers running a customized Debian Linux operating system.
Sequence data were manipulated with the EMBOSS suite version 2.8 (27). Multiple sequence alignments were created with Muscle version 3.0 (28) and MAFFT version 5.667 with parameters retree = 2 and maxiterate = 1000 (29) and visualized with Jalview version 2 (30). Phylogenetic and pairwise trees were constructed with neighbor, protdist and drawtree from the PHYLIP suite version 3.572 (31) and additionally visualized with ATV version 2.03 (32) and Hypertree version 1.0.0 (33). Hidden Markov model (HMM) construction, calibration and searching was performed with the HMMer suite versions 2.1.1 and 2.2g (34), and pairwise comparison of HMMs carried out with PRC version 1.5.3 in global-global mode (35). Sequence LOGOs were generated with WebLogo version 2.8.2 (36) and profile logos generated with logomat-p using the draw_logo method, version 0.71 (37). Protein structures were visualized with PyMol version 0.99 (38). Data were managed using mySQL (http://www.mysql.com) and PostgreSQL (http://www.postgresql.org) relational databases. Calculations were carried out using default parameters except where indicated. All other analyses used custom Perl or Python scripts written by the authors. All supplementary data for this report and an interactive database of the results are publicly available at the web server http://www.snf2.net/.
A full technical description of the library construction and validation and details of the web server will be available elsewhere (D.M.A. Martin and A. Flaus, manuscript in preparation). The procedure used is summarized in outline below.
Twenty-eight biochemically characterized S.cerevisiae Snf2p-like chromatin remodelling proteins or close homologues were selected as seed sequences. The core helicase-like region spanning from 50 amino acids N-terminal to helicase motif I to 50 amino acids C-terminal of helicase motif VI was excised from each protein sequence and multiple alignments were created with Muscle using default parameters. An initial seed HMM was constructed following manual assessment of the alignment.
Swissprot 42 was searched using this profile, expanding the set of Snf2 family sequences to 620 candidates with matches up to E-values of 2 although some matches near this cut-off were helicase-like sequences which were not members of the Snf2 family. Further iterations of HMM construction and searching of Swissprot 42 and model organism databases, followed by curation of sequences to remove fragmented sequences and other artefacts not belonging to the Snf2 family, yielded a set of 948 manually curated sequences which were then aligned by MAFFT.
The resultant profile was employed in searching Uniref100 and identified 5046 sequences with a match of E-value 10 or better of which 3932 had E-values below 1 and 1879 had positive bit-scores. This cut-off may appear generous but was intentional to enable maximum possible inclusion of Snf2 family members. It is predicated by the considerable variation in sequence between helicase motifs III and IV giving rise to poor alignments to the general model in this region and consequently lower bit scores. The cut-offs selected were determined by manual inspection of the hit lists and alignments to include established Snf2 family-like proteins but exclude more distant relatives. The highest E-value for a sequence classified into a subfamily (see below) was 2.2 × 10−16.
A multiple sequence alignment with Muscle, followed by distance matrix calculation and neighbour-joining tree reconstruction allowed the curation of 2305 sequences into subgroupings in the Snf2 family based on the sequence of the helicase-like region. 1306 sequences were classified in 24 individual subfamilies within the Snf2 family (Table 1) after the exclusion of 436 which were fragmentary (did not span completely from helicase motifs I to VI) or contained unique large inserts or deletions. A further 220 were assigned to the prokaryotic rapA group and the remainder to more distantly related clusters or as highly truncated outliers which could not be reasonably aligned. An overview of a neighbour-joining tree constructed from a multiple alignment excluding the variable minor and major insertion regions (see text) and visualized using Hypertree demonstrates the clearly distinguishable division of subfamilies (Figure 1B). Each subfamily was individually examined for redundancy and further internal structure (data not shown).
Each subfamily sequence set was realigned by MAFFT, manually curated with Jalview and an HMM profile constructed. These profiles were aligned with PRC in an all-against-all comparison. Although these profile comparisons do not give a true phylogenetic tree, the scores obtained from the pairwise profile alignments can be used to construct a representational tree (Figure 1C), indicating the relationship between the HMM profiles to be consistent with the sequence-based tree (Figure 1B). It was also observed that the subfamilies could be aggregated into some broad groupings that correlate with functional properties, where known (Figure 1).
All 24 subfamily profiles were combined in one HMM library and the hmmpfam application employed in searching individual genomic datasets to provide phylogenomic information about the taxonomic distribution of Snf2-like proteins. The subfamily hit with maximal bitscore >100 was used to assign membership in a semi-automatic procedure. With very few exceptions, classification was extremely clear with strong discrimination between the top hit and the second best hit (data not shown).
Starting from a seed set of helicase-like region sequences from 28 demonstrated Snf2p-related proteins or close homologues, we have carried out a broad survey of Snf2 family proteins. This was achieved by iterative cycles of manual curation of multiple alignments and neighbour-joining trees to identify Snf2 proteins by similarity, construction of an HMM profile from the multiple alignments of identified proteins, and scanning of global and model organism protein databases using the HMM profile to uncover further sequences for curation. Our current global Snf2 family profile scan revealed 3932 sequences with E-value under 1 (1879 with bitscore > 0) in Uniref100 release 5 [2.4 million entries (26)]. Of these, 1306 sequences were identified as belonging to the Snf2 family and to span the full helicase region from motifs I to VI without introducing large unique insertions or deletions. A further 220 sequences fall within the rapA group, while other hits appear to belong to more distantly related groups (see below) or were too highly truncated to be aligned. Neighbour-joining trees from multiple alignments of the set of 1306 sequences revealed a well-defined branching structure (Figure 1B) and enabled their assignment to 24 distinct subfamilies (Table 1).
Subfamily-specific HMM profiles were constructed from these assignments and used to characterize the Snf2 family complement for 54 complete eukaryote genomes. The counts of predicted proteins and unique encoding genes for 21 selected genomes are listed in Table 2, part A and B, respectively (see Supplementary Table S1A for full analysis of eukaryotic genomes, and Supplementary Table S2 for gene IDs by subfamily for seven common model organisms). In addition, 24 complete archaeal and 269 bacterial genomes were scanned (Supplementary Tables S1B and S1C).
The clear distinction and significant number of subfamilies based on the helicase-like region (Figure 1B and Table 1) reflects both a remarkable breadth and specificity in the Snf2 family. An additional level of similarity distinguishes apparent groupings of subfamilies (Figure 1), which echo current understanding of their functional diversity (Table 3). Most of the best studied Snf2 family proteins fall into a grouping of ‘Snf2-like’ subfamilies including proteins such as S.cerevisiae Snf2p, D.melanogaster Iswi, mouse Chd1 and human Mi-2, which are core subunits of the well-known ATP-dependent chromatin remodelling complexes. A separate ‘Swr1-like’ grouping encompasses the Swr1, Ino80, EP400 and Etl1 subfamilies. The ‘Rad54-like’ grouping contains the Rad54 subfamily, relatives such as ATRX and Arip4, and also includes the recently recognized DRD1 and JBP2 proteins. A further, unexpected, ‘Rad5/16-like’ grouping links several poorly studied subfamilies, three of which contain RING finger insertions within the helicase-like region (see below). The ‘SSO1653-like’ grouping of Mot1, ERCC6 and SSO1653 is notable because all three subfamilies are thought to have non-chromatin substrates. Finally, we have labelled SMARCAL1 proteins as ‘distant’ because they lack several otherwise conserved sequence hallmarks of the Snf2 family (see below). Although some groupings are clear, further investigations will be required to verify those where the boundaries are less distinct.
Since the subfamily assignments are based only on the common helicase-like region, this suggests that the ‘motor’ at the core of even large multiprotein remodeller complexes is tuned to the mechanistic requirements of its function. Such properties are not unprecedented for motor protein subfamilies. The ubiquitous kinesin and myosin proteins are divided into at least 14 and 17 subfamilies, respectively (39,40), and those subfamilies are recognized to reflect tuning of the motors for enzymatic properties linked to particular functional roles. As this also appears to be true for Snf2 family proteins we can anticipate that mechanistic features of the motors will be shared within subfamilies and groupings. This may be useful in helping to predict function of poorly characterized proteins. For example, owing to the recent observation that Swr1 functions in histone exchange (41–43), it is tempting to speculate that the Snf2 motors within other subfamilies in the Swr1-like grouping may be adapted for related purposes.
Owing to the remarkable diversity revealed by this classification and the occurrence of many subfamilies which have not been intensively investigated, we briefly summarize current functional and biochemical understanding and characteristic features of each subfamily in Table 3.
The survey of Snf2 family proteins enables detailed analysis of sequence conservation in the helicase-like region (Figure 2). This reveals a number of unique features distinguishing them from other helicase superfamily SF2 members. First, the conserved helicase motifs show a highly conserved character across the Snf2 family, and some motifs are extended by juxtaposed residues such as conserved blocks E and G (Figure 2 and Supplementary Figure S4). Second, the helicase-like region in the Snf2 family is significantly longer than for many other helicases, primarily due to an increased spacing between motifs III and IV of >160 residues compared to 38 and 78 for typical SF2 helicases NS3 and RecG, respectively (44). Third, a number of unique conserved blocks are found in Snf2 family proteins (Figure 2 and Supplementary Table S5). Several of these blocks have been noted previously (20,45–48), with conserved block B having been confused in a number of early manuscripts with motif IV. Conserved blocks B, C and K are of particular interest because they are located within the characteristic extended inter-motif III–IV region (Figure 3G).
The SMARCAL1 subfamily contains classical helicase motifs which are highly similar to the other subfamilies. It also has an extended motif III–IV spacing, but it nevertheless lack conserved blocks within the motif III–IV region (Supplementary Figure S4). The rapA group has similar properties but is more diverse in overall sequence and retains less similarity in the classical motifs. It is unclear whether the SMARCAL1 subfamily and particularly the rapA group will maintain the structural features of the Snf2 family and they are therefore at the limit of the definition of the Snf2 family. We have also noticed further protein groupings with extended spacing between motifs III and IV and detectable similarity to the classical helicase-like motifs of the Snf2 family sequences (Supplementary Figure S4). These include poxvirus NPH-I related proteins involved in transcription termination (49) and the FANCM/MPH1/Hef group of helicases encompassing yeast Mph1p, archaeal Hef and human FANCM proteins involved in DNA repair (50–52). However, those proteins show low similarity to the Snf2 family between motifs III and IV and appear to lack the characteristic conserved blocks C, J and K of the Snf2 family. Interestingly, comparison of the recently determined Pyrococcus furiosus archaeal Hef helicase structure reveals that the MPH1/Hef group has a related structural organization to Zebrafish Rad54, but contains only a single compact alpha-helical domain encoded between motifs III and IV (Supplementary Figure S6). It has been noted that this extra alpha-helical domain has some similarities with the thumb domain of Taq DNA polymerase which grips the DNA minor groove (53). It is therefore likely that the SMARCAL1 subfamily, rapA group, NPH-I and MPH1/Hef proteins reflect a continuum of diversity while sharing core features with the other Snf2 subfamilies.
None of the 293 scanned archaeal or bacterial genomes contains a protein classified in any of the eukaryotic subfamilies (Supplementary Tables S1B and S1C). All identified archaeal and bacterial proteins belong to the SSO1653 subfamily and rapA group. Conversely, the SSO1653 subfamily and rapA group are likely to be specific to microbial organisms because the only two members of these families identified in eukaryotes (Supplementary Table S1A) appear to be false positives (data not shown). Over two-thirds of complete microbial genomes contain members of the SSO1653 subfamily and/or rapA group. This broad yet incomplete distribution suggests they perform non-essential functions that are sufficiently advantageous to maintain their prevalence.
Although rapA group proteins are distinguished by the lack of several features characteristic of eukaryotic Snf2 family members (see above), the SSO1653 subfamily carries all the Snf2 family sequence and structural hallmarks (Supplementary Figure S4). SSO1653 subfamily members are present in both bacteria and archaea, but they are not ubiquitous in archaeal genomes despite the presence of transcription, replication and repair mechanisms with significant similarity to those of eukaryotes (54,55). There is also no obvious linkage between the presence of histone-like proteins and SSO1653 subfamily members in archaeal genomes (Supplementary Table S1B). Furthermore, the SSO1653 subfamily falls in a grouping (Figure 1C) with the eukaryotic ERCC6 and Mot1 subfamilies whose biochemical role appears not to involve chromatin directly. In contrast to the limited archaeal and bacterial distribution of Snf2 family proteins, all eukaryote genomes contain multiple Snf2 family proteins. The early branching Giardia lamblia and the minimal Encephalozooan cuniculi genomes both encode six different Snf2 family genes falling into subfamilies represented across eukarya (Supplementary Table S1A), several of which have clear linkage to chromatin transactions. It is therefore possible that the microbial SSO1653 subfamily represents an ancestral Snf2-like form from which the eukaryotic subfamilies radiated. Such expansion of the Snf2 family early in eukaryote evolution (20) could have been coincident with the development of high-density nucleosomal packaging (56).
The linkage between the primary sequence-based definitions of the subfamilies and distinct biological function is strongly supported by the presence of one or more subfamily members in each eukaryotic genome across large evolutionary ranges (Table 2 and Supplementary Table S1A). For example, a common set of subfamilies are found in almost all fungi, plant and animal genomes comprising Snf2, Iswi, Chd1, Swr1, Etl1, Mot1, Ino80, Rad5/16, ERCC6, Rad54 and SHPRH. Increased genomic complexity is also paralleled by increasing numbers of subfamilies and members: E.cuniculi with a genome encoding some 2000 gene products has 6 Snf2 family members from 6 subfamilies, whereas the S.cerevisiae genome encoding some 6000 genes has 17 Snf2 family members from 13 subfamilies, and the human genome encoding some 25000 genes has 32 Snf2 family genes from 20 subfamilies (Table 2, part B).
The functional linkage across large evolutionary ranges suggests that each subfamily may have distinctive properties of their ATPase motors tuned to their function. This is supported by recent biochemical results demonstrating that helicase-like regions can be swapped within but not between subfamilies (57). However, a counterpoint is that functional redundancy can occur between subfamilies. For example, synthetic deletion of all three of the S.cerevisiae ISW1, ISW2 and CHD1 genes together is required to generate a strong phenotype (58,59). Redundancy also provides an explanation why some genomes lack certain members: the small genome of Schizosaccharomyces pombe lacks an Iswi subfamily member but maintains two Chd1 subfamily members. In addition to the 11 subfamilies represented broadly across eukaryotes are a number of others restricted to specific taxonomic ranges. For example, CHD7 members are found almost exclusively in animals, and ATRX members are found only in animals and plants.
A number of specific features contribute to the distinction between subfamilies. First, the spacing between motifs III and IV is extended significantly beyond the minimal ~160 residues for a number of subfamilies (Table 4). For the Rad5/16, Ris1 and SHPRH subfamilies, the additional sequences all include RING fingers, whereas for the Swr1 and EP400 subfamilies they comprise highly proline and serine/threonine-rich spans. Ino80 and ATRX subfamilies also contain large, novel and distinct spans. Remarkably, all these large extra insertions occur at the same location in the primary sequence, between conserved blocks C and K which we term the ‘major insertion site’ (Figure 3G and Supplementary Figure S7A). Even for the subfamilies without large insertions there is variation in the length of sequence in the major insertion site (Table 4). For example, the Zebrafish Rad54 structure contains some 25 more residues forming two additional small alpha helices compared to the Sulfolobus solfataricus SSO1653 structure. When Snf2 family members from different subfamilies are aligned, the variability of the major insertion region strongly disturbs the alignment such that a contiguous pattern becomes difficult to define. This has led to some of the Snf2 family proteins being described as having ‘split’ helicase-like ATPase regions. The discontinuity is also the cause of protein motif databases such as SMART and Pfam defining Snf2 family members as matching a bipartite combination of SNF2_N and Helicase_C profiles (Figure 3G). The C-terminal end of the SNF2_N profile corresponds to conserved block C.
Second, subfamilies have characteristic small insertions at other sites (Table 4). Two such sites, also in the motif III–IV region, are located between conserved blocks H and B and between J and C (Figure 2). These are likely to influence the length of the long alpha helical protrusions 1 and 2, respectively (see below, Figure 3C), and there is a difference of some 40 residues between the shortest and longest subfamily lengths for each (Table 4). A ‘minor insertion site’ located between motifs I and Ia on the back of recA-like domain 1 is also occupied by recognizable domains in a few subfamilies from the Rad5/16-like grouping such as SHPRH (Supplementary Figure S3B). A number of other small insertions map to loops between various secondary structural elements (data not shown).
Third, although adhering to a general Snf2 family-specific pattern, individual subfamilies show characteristic patterns in the helicase motifs and in other conserved blocks (Supplementary Figure S4). For example, the well-known helicase motif II with typical DEAH pattern favours DEGH in the Snf2, Mot1 and Rad54 subfamilies, DEAQ in the Swr1, EP400, Ino80 and SSO1653 subfamilies or DESH in the SMARCAL1 subfamily. Likewise, for the typical conserved block E—motif I combined pattern ILADEMGLGKT all ATRX subfamily members have histidine instead of aspartate (i.e. ILAHEMGLGKT) and most Mot1 subfamily members have cysteine replacing alanine (i.e. ILCDEMGLGKT). It is also possible to identify other residues correlating with groups of subfamilies. For example, members of the Snf2, Iswi, Chd1, Mi-2, CHD7, ALC1, Rad54, ATRX and Arip4 subfamilies have an arginine immediately following the motif II DEAH. In the zebrafish Rad54 structure this residue R294 interacts with the sulphate which is suggested to mimic the ATP gamma phosphate.
Two structural determinations of the helicase-like regions of Snf2 family members have been presented recently: zebrafish Rad54 (pdb code 1Z3I) (47) and S.solfataricus SSO1653 (pdb codes 1Z6A, 1Z63, 1Z5Z) (46). As expected for members of the Snf2 family, the fold of each core recA-like domain in the Rad54 and SSO1653 structures is substantially similar and related to those of other known SF1 and SF2 helicases. In the zebrafish Rad54 structure the two recA-like domains are oriented equivalently to those of other known helicase structures (Figure 3A and B), whereas in the S.solfataricus SSO1653 structures recA-like domain 2 is flipped by 180° to an arrangement never previously observed for a helicase (Supplementary Figure S7B). This unusual orientation in SSO1653 is observed for both the DNA free and DNA-bound forms (46).
The most striking feature of the Snf2 family structures is the presence of several additional structural elements grafted onto the core helicase structure. These comprise antiparallel alpha helical protrusions from both recA-like domains 1 and 2 (Figure 3C), a structured linker between the recA-like domains (Figure 3D), the major insertion region at the back side of the domain 2 alpha helical protrusion (Figure 3E) and a triangular brace packed against the domain 2 alpha helical protrusion (Figure 3F). The two alpha-helical protrusions and linker are all encoded within the enlarged span between motifs III and IV. The triangular brace is encoded immediately downstream of motif VI.
Remarkably, the primary sequence features of the Snf2 family correspond directly to the additional structural elements (Figure 3G). First, the bases of the protrusions from recA-like domains 1 and 2 are both fixed by conserved blocks. For protrusion 1, this involves conserved block H composed of a repeating pattern of aromatic residues, with additional involvement of aromatics from conserved block A. For protrusion 2 this involves the arrangement of conserved blocks C, J and K. Second, the protrusions themselves are relatively conserved in sequence and length within subfamilies but not across the whole Snf2 family. Although there is no obvious correlation between the lengths of the protrusions 1 and 2, the distribution of protrusion lengths adheres to multiples of the alpha helical repeat (Supplementary Figure S8), suggesting that protrusions retain structure while varying in extension. Third, the Q motif structure found in many SF2 proteins utilizes a different arrangement of residues to DEAD box helicases such as eIF4A, where an aromatic residue orients the adenine base ring for contacts with a downstream glutamine (4) (Figures 3B). In the Snf2 family, the aromatic residue is contributed by conserved block F downstream of the glutamine. The Q motif affects ATP hydrolysis in DEAD box helicases and mutation of the core glutamine in yeast Snf2 subfamily member Sth1p causes slow growth (4). Fourth, the linker connecting protrusions 1 and 2 contains highly conserved dual arginines in conserved block B. Their central location between the ATP-associating and DNA-associating structural elements suggests that they may play an important role in the mechanism of Snf2 family enzymes. Consistent with this, mutation of the second arginine of the pair in Snf2p leads to effectively complete loss of function of the protein in vivo (48). Finally, the brace is composed of a principal alpha helix anchored by conserved block M into the junction at the base of protrusion 2 composed of conserved blocks C, J and K.
The major insertion region is immediately behind protrusion 2, almost diametrically opposite the ATP-binding site in the zebrafish Rad54 structure (Figure 3E). The nearest residues of the major insertion region in Rad54 are some 15 Å from DNA phosphates for docked DNA (Supplementary Figure S7A). However, an appropriately oriented alpha helix of some 20 residues would be sufficient to reach into the major groove, so large insertions at the major insertion site could potentially interact with DNA or other DNA-binding proteins bound in the groove. In the flipped conformation of domain 2 observed in the SSO1653 structure, the major insertion region is juxtaposed immediately adjacent to the DNA such that two non-conserved arginines from the major insertion region make direct DNA phosphate contacts.
As the distinctive structural features are defined by unique and highly conserved blocks, they are likely to confer properties to the ATPase motor that adapts the action of the core recA-like domains for a unique mechanism. We anticipate that while some features of the Snf2 family mechanism will be common to SF2 translocases, other aspects will be distinctive. Knowledge of the conserved residues and their structural location provides important information for understanding these distinctions.
We have demonstrated that the common helicase-like region is sufficient to enable classification of Snf2 family members. However, almost all Snf2 family polypeptides contain significant additional sequences likely to harbour accessory domains. For some subfamilies there is good correlation with the presence of particular accessory domain combinations (Supplementary Table S9). For example, almost all Snf2 subfamily members contain a bromodomain, ISWI members contain a SANT domain, and Chd1, Mi-2 and CHD7 members contain a chromodomain. However, many domain profiles in resources used for domain analysis have unidentified function or are unreliable in the context of Snf2 proteins. For example, Pfam lacks a SANT-specific profile and detects <10% of SANT domains with a more generic general ‘Myb_DNA-binding’ profile. We are currently undertaking further analysis to improve the relevant profiles and analyse the linkage of Snf2 family accessory domains in detail.
Finally, many Snf2 family proteins are part of larger multi-protein complexes. Accessory motifs within these complexes are also likely to adapt the function of Snf2 motors for different purposes.
Supplementary Data are available at NAR Online.
We gratefully acknowledge Chris Stockdale, Charlie Bond and Helder Ferreira for comments on this manuscript and Diego Miranda-Saavedra and Jim Procter for helpful discussions regarding the analyses. We are also indebted to Andrew Waterhouse for providing essential enhancements to Jalview, and to Jonathan Monk for excellent systems support. We also thank Prof. U. Baumann for coordinates of eIF4A with ADP, Dr Kathleen Sandman and Prof. John Reeve for providing a table of histone-like proteins in archaeal genomes, and Dr Malcolm White for communicating unpublished observations about the SSO1653 protein. AF and TOH are funded by a Wellcome Trust Senior Research Fellowship in Basic Biomedical Science to TOH. Funding to pay the Open Access publication charges for this article was provided by a Wellcome Trust VIP award to the School of Life Sciences in Dundee.
Conflict of interest statement. None declared.