|Home | About | Journals | Submit | Contact Us | Français|
Histone modifications play important roles in chromatin remodeling, gene transcriptional regulation, stem cell maintenance and differentiation. Alterations in histone modifications may be linked to human diseases especially cancer. Histone modifications including methylation, acetylation and ubiquitylation probed by ChIP-seq, ChIP-chip and qChIP have become widely available. Mining and integration of histone modification data can be beneficial to novel biological discoveries. There has been no comprehensive data repository that is exclusive for human histone modifications. Therefore, we developed a relatively comprehensive database for human histone modifications. Human Histone Modification Database (HHMD, http://bioinfo.hrbmu.edu.cn/hhmd) focuses on the storage and integration of histone modification datasets that were obtained from laboratory experiments. The latest release of HHMD incorporates 43 location-specific histone modifications in human. To facilitate data extraction, flexible search options are built in HHMD. It can be searched by histone modification, gene ID, functional categories, chromosome location and cancer name. HHMD also includes a user-friendly visualization tool named HisModView, by which genome-wide histone modification map can be shown. HisModView facilitates the acquisition and visualization of histone modifications. The database also has manually curated information of histone modification dysregulation in nine human cancers.
Eukaryotic DNA is packaged into chromatin incorporating repeating nucleosomes by wrapping DNA around core histones (H2A, H2B, H3 and H4). In mammalian cells, the N-terminal tail of histone is subject to many chemical modifications such as methylation, acetylation, ubiquitylation, phosphorylation and ADP-ribosylation. Histone modifications provide accessible targets for effectors such as histone methyltransferase and acetyltransferase (1) and have different impact on chromatin structure and gene transcription, depending on the types and locations of the modifications (2,3). Currently, methyl-, acetyl- and ubiquityl-groups among various modification types have been studied mainly by ChIP-based technologies. For specific loci at the N-terminal tails of arginines and lysines of histone, up to three methyl-groups can be added. Histone methylation types at different locations have been ascribed to either activating or repressive functions (4). For example, H3K4me3 is positively correlated with gene expression, while H3K9me3 is implicated in heterochromatin formation and gene silencing (5). The normal pattern of histone modifications is vital for chromatin stability and transcriptional regulation (5). Disturbed changes of histone modifications may be correlated with cancer (6). RARβ2 promoter is silenced by H3K27me3 enrichment specifically in prostate cancer without DNA hypermethylation dependence (7). ChIP-based experiments including ChIP-seq, ChIP-chip and qChIP are efficient at probing histone modifications, and have produced a large amount of histone modification data (8). It is useful to have a repository of such data so that in-depth data mining can be performed.
There have been a few resources for histone or histone modifications, such as the ChromatinDB (9), Histone Database (10–12), SysPTM (13) and HistoneHits (14). ChromatinDB is genome-wide resource of histone modifications in Saccharomyces cerevisiae. The Histone Database focuses on histone sequences and structures in many species. Although providing post-translational modification (PTM) information and modified residues information of histone sequences and structures, the Histone Database lacks large-scale profiles of histone modifications for functional evaluation. The major difference between SysPTM and HHMD is that SysPTM is a curated PTM platform for online query and analysis while HHMD is a repository of large-scale histone modification data and also has built-in functions for analysis and visualization. HistoneHits is a database for systematic collections of histone mutants in yeast (14). To the best of our knowledge, there has been no specialized database that focuses on histone modifications in mammals, which hinders further systematic and in-depth data mining. Therefore, there is need to build such a database that dedicates to the storage and analysis of experimental histone modification data. A database of this kind would be beneficial to histone modification studies such as identification of differential histone modification regions (D-HMRs) for a given set of histone modifications.
Cancer has been considered as a complex disease, which involves both genetic and epigenetic alteration. Until now there have been several comprehensive projects dedicated to cancer studies, including The Cancer Genome Atlas (TCGA) (15) and a large amount of researches studying cancer from epigenomic perspective (6,16,17). Although DNA methylation undergoes significant changes, other epigenetic changes such as histone modifications also reflect the tumorigenesis process (6,18). Global aberrant histone modification patterns in tumorigenesis provide novel potential of molecular screening for cancer prevention, diagnosis and treatment (19,20). Tools for histone modification data integration and analysis are still needed for cancer biomarker identification.
We developed the human histone modification database (HHMD), which is available at http://bioinfo.hrbmu.edu.cn/hhmd or http://www.hhmd.org. HHMD focuses on the storage and integration of histone modification information from experimental data. The latest release of HHMD provides genomic context (hg18) for histone modification alignment, which can be used to make comparisons between genomic and epigenomic data.
HHMD incorporates a set of tools for querying histone modifications. Five search options were provided for advanced searches, namely histone modification, gene ID, functional categories, chromosome location and cancer name. Furthermore, HHMD provides a visualization tool—HisModView. It has the capability of investigating histone modifications in a genomic context by superimposing histone modification data on DNA methylation, GC contents and gene information. HHMD may be a useful resource for researchers who are interested in epigenetic regulation and computational epigenetics in human and other species. With the in-house data and built-in tools in HHMD, it makes D-HMRs identification between cancer and control possible, and therefore benefits cancer biomarker identifications.
HHMD contains four types of data: (i) high-throughput histone modifications, (ii) MeDIP methylation, (iii) curated information of aberrant histone modifications, genes and cancers and (iv) GC contents, RefSeq gene (21) and other genomic annotations.
A total of 43 histone modification types (Table 1) classified by histone types have been included in the current release of HHMD. Specifically, there are 228 high-throughput histone modification datasets by manual confirmation. Each dataset is assigned a unique HM ID (e.g. HM-26). All histone modifications were collected from biological experiments, including high-throughput datasets and curated information from literature. All the high-throughput datasets were probed from ChIP-based technology. Among these datasets, 81 were from ChIP-chip, 87 from ChIP-seq and 55 from qChIP. Summarized ChIP-seq data files derived from tag-based bed files with window size of 200 bp were used to speed up HHMD. Histone H3, the most sequenced type, associates with 155 datasets. There are 50 datasets for H4, 11 for H2A and six for histone H2B. Regarding modification types, 122 datasets are methylation-related, 87 of acetylation-related and 1 of ubiquitylation-related. High-throughput datasets were collected from various institutes and websites (Table 2). An interface form for submission of new histone modification information was also provided in HHMD.
To investigate the relationships between histone modifications and DNA methylation, HHMD integrated a panel of large-scale methylation data from MeDIP (22). The methylation data comprises information from 16 tissues including GM06990 cell line data (23).
The functional and genomic annotations of the genes in HHMD were obtained from various databases [i.e. NCBI (24), UCSC (25), GO (Gene Ontology) (26), UniProt (14), Affymetrix probe ID, KEGG (27), RefSeq Protein (28), OMIM (29), GI, UniGene (30), PIR (31) and Ensembl (32)].
To elucidate the interplay of histone modifications and cancer types, two strategies were introduced. First, we collected a panel of aberrant histone modifications in various human cancer types from the literature by manual curation. The current version compiled 833 curated relationships involving 17 human histone modifications and 555 genes that are related to nine human cancers. Among these relationships, 588 were from ChIP-chip and 237 were from ChIP-PCR. The nine cancers are gastric, ovarian, colon, leukemia, prostate, lymphoid, breast, lung and pancreatic cancer. HHMD has imbedded search functions that can be used to query human transcript IDs (21) and official gene symbols. Second, HHMD includes several high throughput aberrant histone modification data such as K562, GM06990 and HeLa cell lines. An interface was also built for submission of cancer-related histone modification information.
HHMD is a highly cross-linked database, which facilitates data acquisition and visualization. The overview of HHMD and two result pages are shown in Figure 1. Figure 1A presents three starting points of HisModView and five search options. To visually understand the data in HHMD, HisModView and searching tools are fully cross-linked. Search results can be downloaded from result pages of HisModView (Figure 1B), or reviewed offline (Figure 1C). In this way, users can analyze data more efficiently. For example, users can start by searching cancer name and visualize the genes in HisModView and then download the specific histone modification data in defined genomic range for further review.
HHMD supports flexible query for various histone modifications and related genomic and functional annotations by providing five search tools. Taking the histone modification search as an example, users can specify the query options such as cytogenetic position, histone modification, technology, tissue, etc. For new users, histone modification search is the suggested option. If interested in specific datasets, they should query by HM IDs. For example, users can enter the page ‘search by histone modification’ and input ‘HM-26’ in the textbox labeled ‘Histone modification search by ID:’, and then proceed. The sample result page is shown in Figure 1C, where a download icon and a HisModView icon are available. Functional categories search is a specific module dedicated to study the relationships between histone modification patterns and functional classifications. Users who are interested in the histone modification distribution for genes of similar functions may find this module helpful. For example, users interested in KEGG pathway: hsa00030 can select ‘H3K4me3’ or other options from the pull-down menu ‘Histone modification:’, then type ‘hsa00030’ in the search field of KEGG ID and leave others as default. In this case, a report of six genes annotated with hsa00030 and 26 histone modification summaries will be returned.
As an efficient visualization tool, HisModView allows users to browse histone modifications and genomic annotations in the context of human genome (hg18). A snapshot of HisModView is shown in Figure 1B. Users can start HisModView from ‘Start by Cytogenetic map’, ‘Start by RefSeq Gene ID’ or ‘Start by Chromosome Location’. The HisModView result page has four sections, namely histone modification, MeDIP methylation, GC contents and RefSeq gene annotations. Labels and histograms of histone modifications are displayed in histone modification section for each genomic track. For each label, the description of the tissue and histone modification type is available and can be displayed by moving the mouse over the label. To study the relationships between histone modifications and DNA methylation, we have integrated a panel of methylation data from 16 tissues generated by MeDIP into the HisModView tool. In this section, methylation data is represented as matrices, each element within represents one ROI [region of interest, defined by Rakyan et al. (23)] and color coding represents different methylation level (from yellow to blue, represents from 0 to 100% methylation). Links to another epigenetic database for Homo Sapiens: MethyCancer (33) are available. The description of ROI, tissue type and the methylation level are also available in the popup menus. A detailed annotation report will be brought up by clicking any of the gene structures in RefSeq genes section. The gene structures such as introns and exons can be characterized for a specific gene and the range for viewing can be adjusted by centering on a specific gene.
HHMD was developed using J2EE. It was built using JSP, Struts and the Java connection pool Proxool. HHMD is running on an Apache Tomcat web server and a MySQL server. The scripts for data analysis were written in JAVA, which are available on HHMD website.
Recent studies of histone modification co-localization suggested that the co-localized histone modifications may mark functionally important regions. Co-localized histone modifications can provide specific cubic targets for biological effectors (34,35). To date, co-localization of H3K4me3 (activating) and H3K27me3 (repressive) is the most studied co-localized pair of markers, which has been ascribed to the developmental control of ES cells (36,37). Yet, identification of more co-localized histone modification pairs with functionalities is still of great interest. We used H3K4me3 and H3K9me3 profiles (38,39) in HHMD to further study the relationships between histone modification co-localization and gene function. H3K4me3 is a histone modification type associated with relaxed chromatin structure and active transcription, while H3K9me3 is a marker associated with heterochromatin formation, gene imprinting and repressive transcription (3,4). Significant imbalance of co-localized H3K4me3 and H3K9me3 is also suggested to have influences on developmental processes. We summarized the distribution of co-localization of H3K4me3 and H3K9me3 (repressive) from resting CD4+T cells to explore histone modification co-localization patterns in genomic context. As shown in Figure 2A–C, H3K4me3 and H3K9me3 co-localized regions (200 bp window size) reveal intermediate genomic pattern which seems to be contributed by both H3K4me3 and H3K9me3. The observation is in good agreement with the findings that methylation of H3 Lys4 and Lys9 play the contrary roles in chromatin regulation (35,40). The genes occupied by co-localized H3K4me3 and H3K9me3 are exemplified in Supplementary files, from which we find that protocadherin alpha gene cluster is marked by such co-localization.
HHMD is predisposed to integrate genomic and epigenomic annotations from publicly available databases. Genomic annotations such as RefSeq genes, GC contents and various epigenomic annotations including histone modifications and DNA methylation data were compiled in HHMD. Users can access the data by searching and investigating histone modifications using HisModView and offline analysis tools.
HisModView is a visualization tool for data profiling and comparison, which makes the identification of variable histone modification regions in multiple tissues feasible. Virtual analysis tools such as D-HMRs identification and other analysis functions are to be integrated in a later release. A standalone version of HHMD that supports complex calculation will be released.
As a resource to study the potential function of histone modification markers, HHMD could be extended with utilities for identification of cancer-related histone modification markers for candidate genes. We will continue to investigate the relationships between histone modifications and other diseases in addition to cancer. Since histone modifications in other species are also accumulating, we will extend the research scope to build specific databases for histone modifications in other species as well.
Supplementary Data are available at NAR Online.
National Natural Science Foundation of China (Grant No. 30871394); the National High Tech Development Project of China, the 863 Program (Grant No. 2007AA02Z329); the National Basic Research Program of China, the 973 Program (Grant No. 2008CB517302); the Natural Science Foundation of Heilongjiang Province (Grant No. D2007-35); the Innovation and Technology special Fund for researchers of Harbin (Grant No. RC2007LX003004). Funding for open access charge: National Natural Science Foundation of China (Grant No. 30871394).
Conflict of interest statement. None declared.
The authors would like to thank Dr Yaoping Lei and Dr Diansong Zhou for revising the manuscript.