For the coming years, the key updates and extensions that we intend to implement are the following. First, an important model organism that is currently missing from SwissRegulon is the fruit fly Drosophila melanogaster. Our curation of Fly regulatory motifs and genome-wide predictions are already in an advanced stage of completion, and we expect to be able to offer genome-wide TFBS annotations for D. melanogaster in the near future. We are also in the course of updating our regulatory site predictions for E. coli, including a newly curated set of WMs, and expect to be able to provide these fairly soon.
A key limitation of SwissRegulon’s TFBS annotations is that, in multicellular eukaryotes, the predictions are limited to promoter regions. Although these regions likely contain a significant fraction of relevant regulatory sites, it is well known that many important regulatory sites are contained in distal cis
-regulatory modules (or enhancers) (26
). Recent developments in high-throughput mapping and analysis of chromatin state along the genome have uncovered that distal regulatory regions can be recognized by their DNase I sensitivity (27
), methylation status (28
) and particular combinations of histone modifications (29
), allowing a more systematic mapping of distal cis
-regulatory modules. Based on such information, we are currently curating a number of sets of distal regulatory regions and expect to be able to provide TFBS predictions for these sets in the near future.
SwissRegulon currently provides an overview page for each regulatory motif that, in particular, provides a sorted list of all promoters/genes targeted by the motif. We intend to develop similar pages for each individual promoter/gene. These pages will thus contain an easy overview of all ‘regulatory inputs’ that are predicted for a given promoter or gene of interest.
Another crucial factor limiting the completeness of genome-wide TFBS predictions is the fact that, for many TFs, the sequence specificity is unknown. However, with the dramatically decreasing sequencing costs, and the more easily accessible protocols for ChIP-seq analysis, the number of available ChIP-seq data-sets is increasing rapidly. We have developed an automatic pipeline for processing ChIP-seq data, identifying high-quality binding peaks, and using motif inference programs such as PhyloGibbs to infer regulatory motifs from such data sets. In the near future, we intend to use this automated pipeline to significantly expand the number of TFs for which regulatory motifs are available.
Finally, our new search function has proven itself as useful tool for quick access to the information but currently only contains information from the annotations of human and mouse, and we intend to extent it in the near future to include all eukaryotic and prokaryotic species that are in the database.