Since its inception in 1995, the Histone Database has been a valuable resource for researchers studying chromatin structure and function, as well as those actively involved in studying transcriptional regulation, where histone fold-containing proteins have a central role. Currently, the Histone Database contains entries that represent a total of 975 organisms. Sequences of the histone proteins and of nonhistones containing the histone fold are available in FASTA format. Additionally, a search engine is available for querying the database. The search engine has the ability to retrieve entries by protein family, organism, keyword, or based on a sequence pattern. The database also includes the three-dimensional structures for histone and histone fold-containing proteins in PDB; each structure has links to PDB and the Molecular Modelling Database, along with the protein name and source organism.
A number of histone fold-containing proteins have been identified among TATA-box binding protein-associated factors (TAFs) and transcription factors; however, the Ras activator Son of Sevenless remains the only cytoplasmic protein containing the histone fold motif. The structure of the histone fold domains from Son of Sevenless was recently determined.12
The N-terminal structure of Son of Sevenless contains two histone folds that can be superimposed onto the H2A/H2B heterodimer with an root-mean-square deviation in Cα
positions of only 1.2 Å. Interestingly, only the second histone fold was detected in a previous sequence analysis using PSI-BLAST searches.4
However, position specific score matrices (PSSMs) can be constructed from structural alignments generated by VAST,17
using other structural neighbors such as histone H2B and the transcription factor NF-Y. The structure-based alignment for the first domain of Son of Sevenless with histone H2B reveals a difference in the loop length between α-helices 2 and 3 (). When the gap in the alignment is included in the PSSM model, the first histone fold domain in Son of Sevenless is successfully identified. The function of the histone fold domains in Son of Sevenless is still unclear, but they are likely to be involved in the formation of higher-order oligomeric and/or heterotypic interactions with other histone fold-containing proteins.
Fig. 2 Structure-based alignment of the first histone fold domain in human Son of Sevenless with yeast histone H2B. Yeast histone H2B (pdb|1ID3, chain D) aligned with the first histone fold domain present in the human ras activator Son of Sevenless (pdb|1Q9C, (more ...)
Another newly identified histone fold-containing protein is the huntingtin-interacting protein M (HYPM; GenBank AAC26851); this protein is highly expressed in testis18
and was originally found in a yeast two-hybrid screen using huntingtin as bait.19
A multiple sequence alignment of HYPM with human, frog, and chicken histones H2A constructed using PSI-BLAST is shown in . Interestingly, it has been shown that huntingtin interacts with Sp1 and TAFII130, causing changes in transcriptional regulation.20
If huntingtin, HYPM, Sp1, or TAFII130 are part of the same complex, our findings suggest that HYPM could serve as a bridge between the complex and other unidentified histone fold-containing proteins.
Fig. 3 Multiple sequence alignment of three histone H2A members and the human huntingtin-interacting protein M (HYPM). Human, frog, and chicken histone H2A sequences aligned with the human HYPM protein. Secondary structural elements from the crystal structures (more ...)
As more and more sequence data continue to accumulate from the targeted sequencing of model genomes, it is interesting to speculate whether additional proteins that putatively contain the histone fold motif will be identified. Although it is difficult (if not impossible) to predict how many histone fold-containing proteins will be identified in the future, the constant refinement of methods such as those used in this study will lead to an improvement in our ability to identify these proteins with a high degree of confidence. In addition, an important computational challenge for the future will be not only to identify putative histone fold-containing proteins, but to use computational methods that will allow for the identification of these proteins’ binding partners. Finally, we anticipate that future updates to this database will include a wider “evolutionary spread” of genomes as targeted sequencing efforts continue at an ever-increasing pace.