Search tips
Search criteria 


Logo of dibGuide for AuthorsAboutExplore this JournalData in Brief
Data Brief. 2017 April; 11: 510–516.
Published online 2017 March 3. doi:  10.1016/j.dib.2017.02.047
PMCID: PMC5349461

Comparative analysis data of SF1 and SF2 helicases from three domains of life


SF1 and SF2 helicases are important molecular motors that use the energy of ATP to unwind nucleic acids or nucleic-acid protein complexes. They are ubiquitous enzymes and found in almost all organisms sequenced to date. This article provides a comparative analysis for SF1 and SF2 helicase families from three domains of life archaea, human, bacteria. Seven families are conserved in these three representatives and includes Upf1-like, UvrD-like, Rad3-like, DEAD-box, RecQ-like. Snf2 and Ski2-like. The data highlight conservation of the helicase core motifs for each of these families. Phylogenetic analysis presented on certain protein families are essential for further studies tracing the evolutionary history of helicase families. The data supplied in this article support publication “Genome-wide identification of SF1 and SF2 helicases from archaea” (Chamieh et al., 2016) [1].

Keywords: Helicase, Archaea, SF1, SF2, Phylogenetics

Specifications Table

Table thumbnail

Value of the data

  • • The presented data on highly conserved amino acids in each of the seven conserved families across the three domains of life is important to design mutagenic studies and therefore determine functional conservation required for helicase function.
  • • Protein sequence comparison between SF1 and SF2 helicase families will allow establishing key experiments for genetic and biochemical analysis of helicase action.
  • • Phylogenetic tree data of Upf1-like, ski2-like and rad3-like shed light on the phylogenic relationship between these helicases in archaea, human and E.coli. The data offers valuable information on the complex evolutionary history within a helicase family and is a starting point for more detailed evolutionary studies on helicase subfamilies.

1. Data

Four figure files are presented. Fig. 1 denotes a comparative analysis of helicase core motifs in conserved families from archaea, bacteria and human. Fig. 2, Fig. 3, Fig. 4 are phylogenetic trees obtained after Maximum Likelihood analysis for Upf1-like and Rad3-like families, and Bayesian analysis for ski2-like helicase family.

Fig. 1
Conserved motifs of the helicase core domain for SF1 and SF2 families across the three domains. All protein sequences were retrieved from existing protein databases. Multiple protein sequence alignment was performed using T-COFFEE EXPRESSO program for ...
Fig. 2.
Molecular Phylogenetic analysis of the Upf1-like family by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method based on the Whelan And Goldman+Freq. model (WAG+F). The percentage of trees in which the ...
Fig. 3
Molecular Phylogenetic analysis of Ski2-like family by Bayesian Method. The evolutionary history was inferred by using the Bayesian method based on the MTMam model. The analysis involved 178 amino acid sequences. Evolutionary analyses were conducted in ...
Fig. 4:
Molecular Phylogenetic analysis of rad3-like family by Maximum Likelihood method. The evolutionary history was inferred by using the Maximum Likelihood method based on the WAG+F model. The percentage of trees in which the associated taxa clustered together ...

2. Experimental design, materials and methods

All protein sequences were retrieved from existing protein databases and were used with their UniProt accession numbers and were classified into different families as shown in Chamieh et al. [1], [2]. Multiple protein sequence alignment was performed using T-COFFEE EXPRESSO program for small sequence numbers (<150 sequences) [3] or PromalS3D for large sequence numbers (>150 sequences) [4]. Fig. 1 was obtained from the multiple sequence alignment files for protein sequences within the same family using the WEBLOGO software [5]. Sequences were inspected for their correct alignment within the helicase core domain. Multiple sequence alignment was trimmed using TrimAl v1.3 method set to automated [6]. The best evolutionary fit model was identified using ProtTest [7]. Phylogenetic analysis was performed using Maximum Likelihood analysis from MEGA7 software [8] or MrBayes with the TOPALI platform [9], [10].


Transparency documentTransparency data associated with this article can be found in the online version at doi:10.1016/j.dib.2017.02.047.

Transparency document. Supplementary material

Supplementary material



1. Chamieh H., Ibrahim H., Kozah J. Genome-wide identification of SF1 and SF2 helicases from archaea. Gene. 2016;576(1 Pt 2):214–228. [PubMed]
2. Apweiler R. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32(90001):115D–119D. [PubMed]
3. Taly J.-F., Magis C., Bussotti G., Chang J.-M., Di Tommaso P., Erb I. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nat. Protoc. 2011;6(11):1669–1682. [PubMed]
4. Pei J., Grishin N.V. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. Methods Mol. Biol. 2014;1079:263–271. [PubMed]
5. Crooks G.E., Hon G., Chandonia J.-M., Brenner S.E. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–1190. [PubMed]
6. Capella-Gutierrez S., Silla-Martinez J.M., Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–1973. [PubMed]
7. Darriba D., Taboada G.L., Doallo R., Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27(8):1164–1165. [PubMed]
8. Kumar S., Stecher G., Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 2016;33(7):1870–1874. [PubMed]
9. Ronquist F., Teslenko M., van der Mark P., Ayres D.L., Darling A., Höhna S. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 2012;61(3):539–542. [PubMed]
10. Milne I., Lindner D., Bayer M., Husmeier D., McGuire G., Marshall D.F. TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics. 2009;25(1):126–127. [PubMed]

Articles from Data in Brief are provided here courtesy of Elsevier