DNA methylation represents a modification of DNA by addition of a methyl group to a cytosine, also referred to as the fifth base (
1). This reaction uses
S-adenosyl-methionine as a methyl donor and is catalysed by a group of enzymes, the DNA methyltransferases (DNMTs). In humans and other mammals, this epigenetic modification is almost exclusively imposed on cytosines that precede a guanosine in the primary DNA sequence (often called a CpG dinucleotide). The frequency of these CpGs in the genome is much lower than would be expected as a methylated cytosine often is subject to deamination thereby forming thymidine. However, in some regions, dense clusters of CpGs can be identified: these regions are referred to as CpG islands (
2).
DNA methylation is an epigenetic change: it does not alter the primary DNA sequence and might contribute to overall genetic stability and maintenance of chromosomal integrity. Consequently, it facilitates the organization of the genome into active and inactive regions with respect to gene transcription (
3). Genes with CpG islands in their promoter region are generally unmethylated in normal tissues. Upon DNA hypermethylation, transcription of the affected genes may be blocked, resulting in gene silencing. In neoplasia, abnormal patterns of DNA methylation have been recognized. Hypermethylation is now considered one of the important mechanisms resulting in silenced expression of tumor suppressor genes, i.e. genes responsible for control of normal cell differentiation and/or inhibition of cell growth. In the last few years, new hypermethylated biomarkers have been used in cancer research and diagnostics (
4).
MethDB (
5), one of the few databases that focus on DNA methylation, is general and sample oriented. But it is not optimized to cancer-related queries because this type of query requires a summarized overview. However, in MethDB querying multiple genes or cancer types is not supported and data is always handled as a separate sample. Another database, MethPrimerDB (
6), has a focus on detection methodologies (e.g. MSP primer design). Both databases discussed here, depend on submissions by administrators or users, which guarantees the required quality of the databases, but consequently they are not always complete and up to date. The databases are neither designed to rank and summarize cancer-related information [genes and cancer (sub)types involved], although this is crucial in applied methylation research in the cancer field.
Hereby we present PubMeth, a database that combines a text-mining approach (fast, intelligent to search multiple aliases and textual variants of these aliases, querying multiple keyword lists at once) with a manual reviewing and annotation step. The latter one drastically improves specificity and annotation quality. The interface is able to rank, summarize and represent data, making the information the database contains easily accessible.
The reviewing step also heavily depends on the text-mining step that sorts abstracts, highlights terms and provides links to different sources. This way, the reviewing step can be done fast and accurate enough to process all abstracts, electronically published until now in PubMed. In addition, using this approach, an update strategy can be more easily implemented.
DNA methylation in cancer research has evolved to a mainstream research topic. Methylation profiles are successfully used in early detection and personalized treatment. However, more and more data is available, especially with the availability of large-scale screening techniques. All the information taken together determines the knowledge of the ‘cancer methylome’. Ultimately, the epigenome of all cancer tissues, including those of different stage and grade, could be mapped out. Epigenetic states differ widely among tissues, and changes are far more varied and much more frequent per tumor than DNA mutations. ‘Each differentiated cell has a different epigenome’, said Jones (
7). In this perspective, it is very useful to extract which genes are already reported in which cancer types from literature. This information might be used as positive controls, to check the same genes in other (related) cancer types, to screen for markers that could be used as early diagnostic utility or in the context of personalized medicine and to deepen the knowledge of the mechanisms of methylation.
PubMeth tries to contain and summarize as many available literature data and presents them in a easy to use graphical interface. It speeds up the process of searching relevant literature, many aliases and keywords are searched at the same time and the results are reliable as they are manually reviewed as one would do when performing a manual literature search.