Accurate recognition of regulatory elements in promoters is an essential
prerequisite for understanding the mechanisms of gene regulation at the
level of transcription. Composite regulatory elements represent a particular
type of such transcriptional regulatory elements consisting of pairs of
individual DNA motifs. In contrast to the present approach, most available
recognition techniques are based purely on statistical evaluation of the
occurrence of single motifs. Such methods are limited in application, since
the accuracy of recognition is greatly dependent on the size and quality of
the sequence dataset. Methods that exploit available knowledge and have
broad applicability are evidently needed.
We developed a novel method to identify composite regulatory elements in
promoters using a library of known examples. In depth investigation of
regularities encoded in known composite elements allowed us to introduce a
new characteristic measure and to improve the specificity compared with
other methods. Tests on an established benchmark and real genomic data show
that our method outperforms other available methods based either on known
examples or statistical evaluations. In addition to better recognition, a
practical advantage of this method is first the ability to detect a high
number of different types of composite elements, and second direct
biological interpretation of the identified results. The program is
and includes an option to extend the provided library by user supplied
The novel algorithm for the identification of composite regulatory elements
presented in this paper was proved to be superior to existing methods. Its
application to tissue specific promoters identified several highly specific
composite elements with relevance to their biological function. This
approach together with other methods will further advance the understanding
of transcriptional regulation of genes.
Massive gene expression changes in different cellular states measured by microarrays, in fact, reflect just an "echo" of real molecular processes in the cells. Transcription factors constitute a class of the regulatory molecules that typically require posttranscriptional modifications or ligand binding in order to exert their function. Therefore, such important functional changes of transcription factors are not directly visible in the microarray experiments.
We developed a novel approach to find key transcription factors that may explain concerted expression changes of specific components of the signal transduction network. The approach aims at revealing evidence of positive feedback loops in the signal transduction circuits through activation of pathway-specific transcription factors. We demonstrate that promoters of genes encoding components of many known signal transduction pathways are enriched by binding sites of those transcription factors that are endpoints of the considered pathways. Application of the approach to the microarray gene expression data on TNF-alpha stimulated primary human endothelial cells helped to reveal novel key transcription factors potentially involved in the regulation of the signal transduction pathways of the cells.
We developed a novel computational approach for revealing key transcription factors by knowledge-based analysis of gene expression data with the help of databases on gene regulatory networks (TRANSFAC® and TRANSPATH®). The corresponding software and databases are available at .
Composite Module Analyst (CMA) is a novel software tool aiming to identify promoter-enhancer models based on the composition of transcription factor (TF) binding sites and their pairs. CMA is closely interconnected with the TRANSFAC® database. In particular, CMA uses the positional weight matrix (PWM) library collected in TRANSFAC® and therefore provides the possibility to search for a large variety of different TF binding sites. We model the structure of the long gene regulatory regions by a Boolean function that joins several local modules, each consisting of co-localized TF binding sites. Having as an input a set of co-regulated genes, CMA builds the promoter model and optimizes the parameters of the model automatically by applying a genetic-regression algorithm. We use a multicomponent fitness function of the algorithm which includes several statistical criteria in a weighted linear function. We show examples of successful application of CMA to a microarray data on transcription profiling of TNF-alpha stimulated primary human endothelial cells. The CMA web server is freely accessible at . An advanced version of CMA is also a part of the commercial system ExPlain™ () designed for causal analysis of gene expression data.
The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of Match™ and Patch™ provides increased functionality for TRANSFAC®. The list of databases which are linked to the common GENE table of TRANSFAC® and TRANSCompel® has been extended by: Ensembl, UniGene, EntrezGene, HumanPSD™ and TRANSPRO™. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel® contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC®, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC® 7.0 and TRANSCompel® 7.0, are accessible under .
TRANSPATH® is a database about signal transduction events. It provides information about signaling molecules, their reactions and the pathways these reactions constitute. The representation of signaling molecules is organized in a number of orthogonal hierarchies reflecting the classification of the molecules, their species-specific or generic features, and their post-translational modifications. Reactions are similarly hierarchically organized in a three-layer architecture, differentiating between reactions that are evidenced by individual publications, generalizations of these reactions to construct species-independent ‘reference pathways’ and the ‘semantic projections’ of these pathways. A number of search and browse options allow easy access to the database contents, which can be visualized with the tool PathwayBuilder™. The module PathoSign adds data about pathologically relevant mutations in signaling components, including their genotypes and phenotypes. TRANSPATH® and PathoSign can be used as encyclopaedia, in the educational process, for vizualization and modeling of signal transduction networks and for the analysis of gene expression data. TRANSPATH® Public 6.0 is freely accessible for users from non-profit organizations under .
can either be used as an encyclopedia, for both specific and general
information on signal transduction, or can serve as a network analyser. Therefore,
three modules have been created: the first one is the data, which have been manually
extracted, mostly from the primary literature; the second is PathwayBuilder™,
which provides several different types of network visualization and hence faciliates
understanding; the third is ArrayAnalyzer™, which is particularly suited to gene
expression array interpretation, and is able to identify key molecules within signalling
networks (potential drug targets). These key molecules could be responsible for
the coordinated regulation of downstream events. Manual data extraction focuses
on direct reactions between signalling molecules and the experimental evidence for
them, including species of genes/proteins used in individual experiments, experimental
systems, materials and methods. This combination of materials and methods is
used in TRANSPATH®
to assign a quality value to each experimentally proven reaction,
which reflects the probability that this reaction would happen under
physiological conditions. Another important feature in TRANSPATH® is the inclusion
of transcription factor–gene relations, which are transferred from TRANSFAC®,
a database focused on transcription regulation and transcription factors. Since
interactions between molecules are mainly direct, this allows a complete and
stepwise pathway reconstruction from ligands to regulated genes. More information is
available at www.biobase.de/pages/products/databases.html.
MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. MatchTM is closely interconnected and distributed together with the TRANSFAC® database. In particular, MatchTM uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cut-off values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cut-off values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC® team. A public version of the MatchTM tool is available at: http://www.gene-regulation.com/pub/programs.html#match. The same program with a different web interface can be found at http://compel.bionet.nsc.ru/Match/Match.html. An advanced version of the tool called MatchTM Professional is available at http://www.biobase.de.
The TRANSFAC® database on eukaryotic transcriptional regulation, comprising data on transcription factors, their target genes and regulatory binding sites, has been extended and further developed, both in number of entries and in the scope and structure of the collected data. Structured fields for expression patterns have been introduced for transcription factors from human and mouse, using the CYTOMER® database on anatomical structures and developmental stages. The functionality of Match™, a tool for matrix-based search of transcription factor binding sites, has been enhanced. For instance, the program now comes along with a number of tissue-(or state-)specific profiles and new profiles can be created and modified with Match™ Profiler. The GENE table was extended and gained in importance, containing amongst others links to LocusLink, RefSeq and OMIM now. Further, (direct) links between factor and target gene on one hand and between gene and encoded factor on the other hand were introduced. The TRANSFAC® public release is available at http://www.gene-regulation.com. For yeast an additional release including the latest data was made available separately as TRANSFAC® Saccharomyces Module (TSM) at http://transfac.gbf.de. For CYTOMER® free download versions are available at http://www.biobase.de:8080/index.html.
Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.
COMPEL is a database on composite regulatory elements, the basic structures of combinatorial regulation. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. The structure of the relational model of COMPEL is determined by the concept of molecular structure and regulatory role of CEs. Based on the set of a particular CE, a program has been developed for searching potential CEs in gene regulatory regions. WWW search and browse routines were developed for COMPEL release 3.0. The COMPEL database equipped with the search and browse tools is available at http://compel.bionet.nsc.ru/ . The program for prediction of potential CEs of NFAT type is available at http://compel.bionet.nsc.ru/FunSite.html and http://transfac.gbf.de/dbsearch/funsitep/s_comp.html
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure–function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure–function features of its regulatory regions (transcription factor binding sites, promoters, enhancers, silencers, etc.) and gene expression regulation patterns. The current release, TRRD 4.2.5, comprises the description of 760 genes, 3403 expression patterns, and >4600 regulatory elements including 3604 transcription factor binding sites, 600 promoters and 152 enhancers. This information was obtained through annotation of 2537 scientific publications. TRRD 4.2.5 is available through the WWW at http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
TRANSFAC is a database on transcription factors, their genomic binding sites and DNA-binding profiles. In addition to being updated and extended by new features, it has been complemented now by a series of additional database modules. Among them, modules which provide data about signal transduction pathways (TRANSPATH) or about cell types/organs/developmental stages (CYTOMER) are available as well as an updated version of the previously described COMPEL database. The databases are available on the WWW at http://transfac.gbf.de/
The Transcription Regulatory Regions Database (TRRD) is a curated database designed for accumulation of experimental data on extended regulatory regions of eukaryotic genes, the regulatory elements they contain, i.e., transcription factor binding sites, promoters, enhancers, silencers, etc., and expression patterns of the genes. Release 4.1 of TRRD offers a number of significant improvements, in particular, a more detailed description of transcription factor binding sites, transcription factors per se, and gene expression patterns in a computer-readable format. In addition, the new TRRD release provides considerably more references to other molecular biological databases. TRRD 4.1 is installed under SRS and is available through the WWW at http://www.bionet.nsc.ru/trrd/
TRANSFAC, TRRD (Transcription Regulatory Region Database) and COMPEL are databases which store information about transcriptional regulation in eukaryotic cells. The three databases provide distinct views on the components involved in transcription: transcription factors and their binding sites and binding profiles (TRANSFAC), the regulatory hierarchy of whole genes (TRRD), and the structural and functional properties of composite elements (COMPEL). The quantitative and qualitative changes of all three databases and connected programs are described. The databases are accessible via WWW:http://transfac.gbf.de/TRANSFAC orhttp://www.bionet.nsc.ru/TRRD
Three databases that provide data on transcriptional regulation are described. TRANSFAC is a database on transcription factors and their DNA binding sites. TRRD (Transcription Regulatory Region Database) collects information about complete regulatory regions, their regulation properties and architecture. COMPEL comprises specific information on composite regulatory elements. Here, we describe the present status of these databases and the first steps towards their federation.
Over the past years, evidence has been accumulating for a fundamental role of protein-protein interactions between transcription factors in gene-specific transcription regulation. Many of these interactions run within composite elements containing binding sites for several factors. We have selected 101 composite regulatory elements identified experimentally in the regulatory regions of 64 genes of vertebrates and of their viruses and briefly described them in a compilation. Of these, 82 composite elements are of the synergistic type and 19 of the antagonistic type. Within the synergistic type composite elements, transcription factors bind to the corresponding sites simultaneously, thus cooperatively activating transcription. The factors, binding to their target sites within antagonistic type composite elements, produce opposing effects on transcription. The nucleotide sequence and localization in the genes, the names and brief description of transcription factors, are provided for each composite element, including a representation of experimental data on its functioning. Most of the composite elements (3/4) fall between -250 bp and the transcription start site. The distance between the binding sites within the composite elements described varies from complete overlapping to 80 bp. The compilation of composite elements is presented in the database COMPEL which is electronically accessible by anonymous ftp via internet.
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
Clinically, gentamicin has been used extensively to treat the debilitating symptoms of Mèniére’s disease and is well known for its vestibulotoxic properties. Until recently, it was widely accepted that the round window membrane (RWM) was the primary entry route into the inner ear following intratympanic drug administration. In the current study, gentamicin was delivered to either the RWM or the stapes footplate of guinea pigs (GPs) to assess the associated hearing loss and histopathology associated with each procedure. Vestibulotoxicity of the utricular macula, saccular macula, and crista ampullaris in the posterior semicircular canal were assessed quantitatively with density counts of hair cells, supporting cells, and stereocilia in histological sections. Cochleotoxicity was assessed quantitatively by changes in threshold of auditory brainstem responses (ABR), along with hair cell and spiral ganglion cell counts in the basal and second turns of the cochlea. Animals receiving gentamicin applied to the stapes footplate exhibited markedly higher levels of hearing loss between 8–32kHz, a greater reduction of outer hair cells in the basal turn of the cochlea and fewer normal type I cells in the utricle in the vestibule than those receiving gentamicin on the RWM or saline controls. This suggests that gentamicin more readily enters the ear when applied to the stapes footplate compared with RWM application. These data provide a potential explanation for why gentamicin preferentially ablates vestibular function while preserving hearing following transtympanic administration in humans.
Inner ear drug delivery; gentamicin; pharmacokinetics; oval window; stapes; stapediovestibular joint; annular ligament
Rescue of the p53 tumor suppressor is an attractive cancer therapy approach. However, pharmacologically activated p53 can induce diverse responses ranging from cell death to growth arrest and DNA repair, which limits the efficient application of p53-reactivating drugs in clinic. Elucidation of the molecular mechanisms defining the biological outcome upon p53 activation remains a grand challenge in the p53 field. Here, we report that concurrent pharmacological activation of p53 and inhibition of thioredoxin reductase followed by generation of reactive oxygen species (ROS), result in the synthetic lethality in cancer cells. ROS promote the activation of c-Jun N-terminal kinase (JNK) and DNA damage response, which establishes a positive feedback loop with p53. This converts the p53-induced growth arrest/senescence to apoptosis. We identified several survival oncogenes inhibited by p53 in JNK-dependent manner, including Mcl1, PI3K, eIF4E, as well as p53 inhibitors Wip1 and MdmX. Further, we show that Wip1 is one of the crucial executors downstream of JNK whose ablation confers the enhanced and sustained p53 transcriptional response contributing to cell death. Our study provides novel insights for manipulating p53 response in a controlled way. Further, our results may enable new pharmacological strategy to exploit abnormally high ROS level, often linked with higher aggressiveness in cancer, to selectively kill cancer cells upon pharmacological reactivation of p53.
TrxR; ROS; JNK; p53; Wip1; inhibition of oncogenes
To determine whether a systemic immune response influences hearing thresholds and tissue response after cochlear implantation of hearing guinea pigs.
Guinea pigs were inoculated with sterile antigen (Keyhole limpet hemocyanin) 3 weeks before cochlear implantation. Pure-tone auditory brainstem response thresholds were performed before implantation and 1 and 4 weeks later. Dexamethasone phosphate 20% was adsorbed onto a hyaluronic acid carboxymethylcellulose sponge and was applied to the round window for 30 minutes before electrode insertion. Normal saline was used for controls. Cochlear histology was performed at 4 weeks after implantation to assess the tissue response to implantation. To control for the effect of keyhole limpet hemocyanin priming, a group of unprimed animals underwent cochlear implantation with a saline-soaked pledget applied to the round window.
Keyhole limpet hemocyanin priming had no significant detrimental effect on thresholds without implantation. Thresholds were elevated after implantation across all frequencies tested (2–32 kHz) in primed animals but only at higher frequencies (4–32 kHz) in unprimed controls. In primed animals, dexamethasone treatment significantly reduced threshold shifts at 2 and 8 kHz. Keyhole limpet hemocyanin led to the more frequent observation of lymphocytes in the tissue response to the implant.
Systemic immune activation at the time of cochlear implantation broadened the range of frequencies experiencing elevated thresholds after implantation. Local dexamethasone provides partial protection against this hearing loss, but the degree and extent of protection are less compared to previous studies with unprimed animals.
Cochlear implant; Hearing loss; Innate immunity; Keyhole limpet hemocyanin; Systemic immune system
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context.
First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered.
In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
cyclization; furans; homogeneous catalysis; rearrangement; synthetic methods
A general, mild, and efficient 1,2-migration/cycloisomerization methodology toward multisubstituted 3-thio-, seleno-, halo-, aryl-, and alkyl-furans and pyrroles, as well as fused heterocycles, valuable building blocks for synthetic chemistry, has been developed. Moreover, regiodivergent conditions have been identified for C-4 bromo- and thio-substituted allenones and alkynones for the assembly of regioisomeric 2-hetero substituted furans selectively. It was demonstrated that, depending on reaction conditions, ambident substrates can be selectively transformed into furan products, as well as undergo selective 6-exo-dig or Nazarov cyclizations. Our mechanistic investigations have revealed that the transformation proceeds via allenylcarbonyl or allenylimine intermediates followed by 1,2-group migration to the allenyl sp carbon during cycloisomerization. It was found that 1,2-migration of chalcogens and halogens predominantly proceeds via formation of irenium intermediates. Analogous intermediate can also be proposed for 1,2-aryl shift. Furthermore, it was shown that the cycloisomerization cascade can be catalyzed by Brønsted acids, albeit less efficiently, and commonly observed reactivity of Lewis acid catalysts cannot be attributed to the eventual formation of proton. Undoubtedly, thermally induced or Lewis acid-catalyzed transformations proceed via intramolecular Michael addition or activation of the enone moiety pathways, whereas certain carbophilic metals trigger carbenoid/oxonium type pathway. However, a facile cycloisomerization in the presence of cationic complexes, as well as observed migratory aptitude in the cycloisomerization of unsymmetrically disubstituted aryl- and alkylallenes, strongly supports electrophilic nature for this transformation. Full mechanistic details, as well as the scope of this transformation, are discussed.
Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.
Transcription factors play a central role in the regulation of gene expression. Their interaction with specific elements in the DNA mediates dynamic changes in transcriptional activity. Databases store a growing number of known DNA sequence patterns, also denoted as DNA sequence motifs that are recognized by transcription factors. Such databases can be searched to find a match for a newly discovered pattern and that way identify the potential binding factor. It is also of interest to cluster motifs in order to examine which transcription factors have similar binding properties and, thus, may promiscuously bind to each other's sites, or how many distinct specificities have been described. To gain deeper insight into the similarities between DNA sequence motifs, we analyzed a comprehensive set of known motifs. For this purpose we devised a network-based approach that enabled us to identify clusters of related motifs that largely coincided with grouping of related TFs on the basis of protein similarity. On the basis of these results, we were able to predict whether two motifs belong to the same subgroup and constructed a novel, fully-automated method for motif clustering, which enables users to assess the similarity of a newly found motif with all known motifs in the collection.