PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (44)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  MatrixCatch - a novel tool for the recognition of composite regulatory elements in promoters 
BMC Bioinformatics  2013;14:241.
Background
Accurate recognition of regulatory elements in promoters is an essential prerequisite for understanding the mechanisms of gene regulation at the level of transcription. Composite regulatory elements represent a particular type of such transcriptional regulatory elements consisting of pairs of individual DNA motifs. In contrast to the present approach, most available recognition techniques are based purely on statistical evaluation of the occurrence of single motifs. Such methods are limited in application, since the accuracy of recognition is greatly dependent on the size and quality of the sequence dataset. Methods that exploit available knowledge and have broad applicability are evidently needed.
Results
We developed a novel method to identify composite regulatory elements in promoters using a library of known examples. In depth investigation of regularities encoded in known composite elements allowed us to introduce a new characteristic measure and to improve the specificity compared with other methods. Tests on an established benchmark and real genomic data show that our method outperforms other available methods based either on known examples or statistical evaluations. In addition to better recognition, a practical advantage of this method is first the ability to detect a high number of different types of composite elements, and second direct biological interpretation of the identified results. The program is available at http://gnaweb.helmholtz-hzi.de/cgi-bin/MCatch/MatrixCatch.pl and includes an option to extend the provided library by user supplied data.
Conclusions
The novel algorithm for the identification of composite regulatory elements presented in this paper was proved to be superior to existing methods. Its application to tissue specific promoters identified several highly specific composite elements with relevance to their biological function. This approach together with other methods will further advance the understanding of transcriptional regulation of genes.
doi:10.1186/1471-2105-14-241
PMCID: PMC3754795  PMID: 23924163
2.  Beyond microarrays: Finding key transcription factors controlling signal transduction pathways 
BMC Bioinformatics  2006;7(Suppl 2):S13.
Background
Massive gene expression changes in different cellular states measured by microarrays, in fact, reflect just an "echo" of real molecular processes in the cells. Transcription factors constitute a class of the regulatory molecules that typically require posttranscriptional modifications or ligand binding in order to exert their function. Therefore, such important functional changes of transcription factors are not directly visible in the microarray experiments.
Results
We developed a novel approach to find key transcription factors that may explain concerted expression changes of specific components of the signal transduction network. The approach aims at revealing evidence of positive feedback loops in the signal transduction circuits through activation of pathway-specific transcription factors. We demonstrate that promoters of genes encoding components of many known signal transduction pathways are enriched by binding sites of those transcription factors that are endpoints of the considered pathways. Application of the approach to the microarray gene expression data on TNF-alpha stimulated primary human endothelial cells helped to reveal novel key transcription factors potentially involved in the regulation of the signal transduction pathways of the cells.
Conclusion
We developed a novel computational approach for revealing key transcription factors by knowledge-based analysis of gene expression data with the help of databases on gene regulatory networks (TRANSFAC® and TRANSPATH®). The corresponding software and databases are available at .
doi:10.1186/1471-2105-7-S2-S13
PMCID: PMC1683568  PMID: 17118134
3.  Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm 
Nucleic Acids Research  2006;34(Web Server issue):W541-W545.
Composite Module Analyst (CMA) is a novel software tool aiming to identify promoter-enhancer models based on the composition of transcription factor (TF) binding sites and their pairs. CMA is closely interconnected with the TRANSFAC® database. In particular, CMA uses the positional weight matrix (PWM) library collected in TRANSFAC® and therefore provides the possibility to search for a large variety of different TF binding sites. We model the structure of the long gene regulatory regions by a Boolean function that joins several local modules, each consisting of co-localized TF binding sites. Having as an input a set of co-regulated genes, CMA builds the promoter model and optimizes the parameters of the model automatically by applying a genetic-regression algorithm. We use a multicomponent fitness function of the algorithm which includes several statistical criteria in a weighted linear function. We show examples of successful application of CMA to a microarray data on transcription profiling of TNF-alpha stimulated primary human endothelial cells. The CMA web server is freely accessible at . An advanced version of CMA is also a part of the commercial system ExPlain™ () designed for causal analysis of gene expression data.
doi:10.1093/nar/gkl342
PMCID: PMC1538785  PMID: 16845066
4.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes 
Nucleic Acids Research  2005;34(Database issue):D108-D110.
The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of Match™ and Patch™ provides increased functionality for TRANSFAC®. The list of databases which are linked to the common GENE table of TRANSFAC® and TRANSCompel® has been extended by: Ensembl, UniGene, EntrezGene, HumanPSD™ and TRANSPRO™. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel® contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC®, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC® 7.0 and TRANSCompel® 7.0, are accessible under .
doi:10.1093/nar/gkj143
PMCID: PMC1347505  PMID: 16381825
5.  TRANSPATH®: an information resource for storing and visualizing signaling pathways and their pathological aberrations 
Nucleic Acids Research  2005;34(Database issue):D546-D551.
TRANSPATH® is a database about signal transduction events. It provides information about signaling molecules, their reactions and the pathways these reactions constitute. The representation of signaling molecules is organized in a number of orthogonal hierarchies reflecting the classification of the molecules, their species-specific or generic features, and their post-translational modifications. Reactions are similarly hierarchically organized in a three-layer architecture, differentiating between reactions that are evidenced by individual publications, generalizations of these reactions to construct species-independent ‘reference pathways’ and the ‘semantic projections’ of these pathways. A number of search and browse options allow easy access to the database contents, which can be visualized with the tool PathwayBuilder™. The module PathoSign adds data about pathologically relevant mutations in signaling components, including their genotypes and phenotypes. TRANSPATH® and PathoSign can be used as encyclopaedia, in the educational process, for vizualization and modeling of signal transduction networks and for the analysis of gene expression data. TRANSPATH® Public 6.0 is freely accessible for users from non-profit organizations under .
doi:10.1093/nar/gkj107
PMCID: PMC1347469  PMID: 16381929
6.  TRANSPATH®—A High Quality Database Focused on Signal Transduction 
TRANSPATH® can either be used as an encyclopedia, for both specific and general information on signal transduction, or can serve as a network analyser. Therefore, three modules have been created: the first one is the data, which have been manually extracted, mostly from the primary literature; the second is PathwayBuilder™, which provides several different types of network visualization and hence faciliates understanding; the third is ArrayAnalyzer™, which is particularly suited to gene expression array interpretation, and is able to identify key molecules within signalling networks (potential drug targets). These key molecules could be responsible for the coordinated regulation of downstream events. Manual data extraction focuses on direct reactions between signalling molecules and the experimental evidence for them, including species of genes/proteins used in individual experiments, experimental systems, materials and methods. This combination of materials and methods is used in TRANSPATH® to assign a quality value to each experimentally proven reaction, which reflects the probability that this reaction would happen under physiological conditions. Another important feature in TRANSPATH® is the inclusion of transcription factor–gene relations, which are transferred from TRANSFAC®, a database focused on transcription regulation and transcription factors. Since interactions between molecules are mainly direct, this allows a complete and stepwise pathway reconstruction from ligands to regulated genes. More information is available at www.biobase.de/pages/products/databases.html.
doi:10.1002/cfg.386
PMCID: PMC2447348  PMID: 18629064
7.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences 
Nucleic Acids Research  2003;31(13):3576-3579.
MatchTM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. MatchTM is closely interconnected and distributed together with the TRANSFAC® database. In particular, MatchTM uses the matrix library collected in TRANSFAC® and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cut-off values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cut-off values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC® team. A public version of the MatchTM tool is available at: http://www.gene-regulation.com/pub/programs.html#match. The same program with a different web interface can be found at http://compel.bionet.nsc.ru/Match/Match.html. An advanced version of the tool called MatchTM Professional is available at http://www.biobase.de.
PMCID: PMC169193  PMID: 12824369
8.  TRANSFAC®: transcriptional regulation, from patterns to profiles 
Nucleic Acids Research  2003;31(1):374-378.
The TRANSFAC® database on eukaryotic transcriptional regulation, comprising data on transcription factors, their target genes and regulatory binding sites, has been extended and further developed, both in number of entries and in the scope and structure of the collected data. Structured fields for expression patterns have been introduced for transcription factors from human and mouse, using the CYTOMER® database on anatomical structures and developmental stages. The functionality of Match™, a tool for matrix-based search of transcription factor binding sites, has been enhanced. For instance, the program now comes along with a number of tissue-(or state-)specific profiles and new profiles can be created and modified with Match™ Profiler. The GENE table was extended and gained in importance, containing amongst others links to LocusLink, RefSeq and OMIM now. Further, (direct) links between factor and target gene on one hand and between gene and encoded factor on the other hand were introduced. The TRANSFAC® public release is available at http://www.gene-regulation.com. For yeast an additional release including the latest data was made available separately as TRANSFAC® Saccharomyces Module (TSM) at http://transfac.gbf.de. For CYTOMER® free download versions are available at http://www.biobase.de:8080/index.html.
PMCID: PMC165555  PMID: 12520026
9.  TRANSCompel®: a database on composite regulatory elements in eukaryotic genes 
Nucleic Acids Research  2002;30(1):332-334.
Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.
PMCID: PMC99108  PMID: 11752329
10.  COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation 
Nucleic Acids Research  2000;28(1):311-315.
COMPEL is a database on composite regulatory elements, the basic structures of combinatorial regulation. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. The structure of the relational model of COMPEL is determined by the concept of molecular structure and regulatory role of CEs. Based on the set of a particular CE, a program has been developed for searching potential CEs in gene regulatory regions. WWW search and browse routines were developed for COMPEL release 3.0. The COMPEL database equipped with the search and browse tools is available at http://compel.bionet.nsc.ru/ . The program for prediction of potential CEs of NFAT type is available at http://compel.bionet.nsc.ru/FunSite.html and http://transfac.gbf.de/dbsearch/funsitep/s_comp.html
PMCID: PMC102399  PMID: 10592258
11.  Transcription Regulatory Regions Database (TRRD): its status in 2000 
Nucleic Acids Research  2000;28(1):298-301.
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure–function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure–function features of its regulatory regions (transcription factor binding sites, promoters, enhancers, silencers, etc.) and gene expression regulation patterns. The current release, TRRD 4.2.5, comprises the description of 760 genes, 3403 expression patterns, and >4600 regulatory elements including 3604 transcription factor binding sites, 600 promoters and 152 enhancers. This information was obtained through annotation of 2537 scientific publications. TRRD 4.2.5 is available through the WWW at http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
PMCID: PMC102412  PMID: 10592253
12.  Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms. 
Nucleic Acids Research  1999;27(1):318-322.
TRANSFAC is a database on transcription factors, their genomic binding sites and DNA-binding profiles. In addition to being updated and extended by new features, it has been complemented now by a series of additional database modules. Among them, modules which provide data about signal transduction pathways (TRANSPATH) or about cell types/organs/developmental stages (CYTOMER) are available as well as an updated version of the previously described COMPEL database. The databases are available on the WWW at http://transfac.gbf.de/
PMCID: PMC148171  PMID: 9847216
13.  Transcription Regulatory Regions Database (TRRD):its status in 1999. 
Nucleic Acids Research  1999;27(1):303-306.
The Transcription Regulatory Regions Database (TRRD) is a curated database designed for accumulation of experimental data on extended regulatory regions of eukaryotic genes, the regulatory elements they contain, i.e., transcription factor binding sites, promoters, enhancers, silencers, etc., and expression patterns of the genes. Release 4.1 of TRRD offers a number of significant improvements, in particular, a more detailed description of transcription factor binding sites, transcription factors per se, and gene expression patterns in a computer-readable format. In addition, the new TRRD release provides considerably more references to other molecular biological databases. TRRD 4.1 is installed under SRS and is available through the WWW at http://www.bionet.nsc.ru/trrd/
PMCID: PMC148165  PMID: 9847210
14.  Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. 
Nucleic Acids Research  1998;26(1):362-367.
TRANSFAC, TRRD (Transcription Regulatory Region Database) and COMPEL are databases which store information about transcriptional regulation in eukaryotic cells. The three databases provide distinct views on the components involved in transcription: transcription factors and their binding sites and binding profiles (TRANSFAC), the regulatory hierarchy of whole genes (TRRD), and the structural and functional properties of composite elements (COMPEL). The quantitative and qualitative changes of all three databases and connected programs are described. The databases are accessible via WWW:http://transfac.gbf.de/TRANSFAC orhttp://www.bionet.nsc.ru/TRRD
PMCID: PMC147251  PMID: 9399875
15.  TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation. 
Nucleic Acids Research  1997;25(1):265-268.
Three databases that provide data on transcriptional regulation are described. TRANSFAC is a database on transcription factors and their DNA binding sites. TRRD (Transcription Regulatory Region Database) collects information about complete regulatory regions, their regulation properties and architecture. COMPEL comprises specific information on composite regulatory elements. Here, we describe the present status of these databases and the first steps towards their federation.
PMCID: PMC146363  PMID: 9016550
16.  A compilation of composite regulatory elements affecting gene transcription in vertebrates. 
Nucleic Acids Research  1995;23(20):4097-4103.
Over the past years, evidence has been accumulating for a fundamental role of protein-protein interactions between transcription factors in gene-specific transcription regulation. Many of these interactions run within composite elements containing binding sites for several factors. We have selected 101 composite regulatory elements identified experimentally in the regulatory regions of 64 genes of vertebrates and of their viruses and briefly described them in a compilation. Of these, 82 composite elements are of the synergistic type and 19 of the antagonistic type. Within the synergistic type composite elements, transcription factors bind to the corresponding sites simultaneously, thus cooperatively activating transcription. The factors, binding to their target sites within antagonistic type composite elements, produce opposing effects on transcription. The nucleotide sequence and localization in the genes, the names and brief description of transcription factors, are provided for each composite element, including a representation of experimental data on its functioning. Most of the composite elements (3/4) fall between -250 bp and the transcription start site. The distance between the binding sites within the composite elements described varies from complete overlapping to 80 bp. The compilation of composite elements is presented in the database COMPEL which is electronically accessible by anonymous ftp via internet.
PMCID: PMC307349  PMID: 7479071
17.  ROS-dependent activation of JNK converts p53 into an efficient inhibitor of oncogenes leading to robust apoptosis 
Cell Death and Differentiation  2014;21(4):612-623.
Rescue of the p53 tumor suppressor is an attractive cancer therapy approach. However, pharmacologically activated p53 can induce diverse responses ranging from cell death to growth arrest and DNA repair, which limits the efficient application of p53-reactivating drugs in clinic. Elucidation of the molecular mechanisms defining the biological outcome upon p53 activation remains a grand challenge in the p53 field. Here, we report that concurrent pharmacological activation of p53 and inhibition of thioredoxin reductase followed by generation of reactive oxygen species (ROS), result in the synthetic lethality in cancer cells. ROS promote the activation of c-Jun N-terminal kinase (JNK) and DNA damage response, which establishes a positive feedback loop with p53. This converts the p53-induced growth arrest/senescence to apoptosis. We identified several survival oncogenes inhibited by p53 in JNK-dependent manner, including Mcl1, PI3K, eIF4E, as well as p53 inhibitors Wip1 and MdmX. Further, we show that Wip1 is one of the crucial executors downstream of JNK whose ablation confers the enhanced and sustained p53 transcriptional response contributing to cell death. Our study provides novel insights for manipulating p53 response in a controlled way. Further, our results may enable new pharmacological strategy to exploit abnormally high ROS level, often linked with higher aggressiveness in cancer, to selectively kill cancer cells upon pharmacological reactivation of p53.
doi:10.1038/cdd.2013.186
PMCID: PMC3950324  PMID: 24413150
TrxR; ROS; JNK; p53; Wip1; inhibition of oncogenes
18.  Systemic Immunity Influences Hearing Preservation in Cochlear Implantation 
Hypothesis
To determine whether a systemic immune response influences hearing thresholds and tissue response after cochlear implantation of hearing guinea pigs.
Methods
Guinea pigs were inoculated with sterile antigen (Keyhole limpet hemocyanin) 3 weeks before cochlear implantation. Pure-tone auditory brainstem response thresholds were performed before implantation and 1 and 4 weeks later. Dexamethasone phosphate 20% was adsorbed onto a hyaluronic acid carboxymethylcellulose sponge and was applied to the round window for 30 minutes before electrode insertion. Normal saline was used for controls. Cochlear histology was performed at 4 weeks after implantation to assess the tissue response to implantation. To control for the effect of keyhole limpet hemocyanin priming, a group of unprimed animals underwent cochlear implantation with a saline-soaked pledget applied to the round window.
Results
Keyhole limpet hemocyanin priming had no significant detrimental effect on thresholds without implantation. Thresholds were elevated after implantation across all frequencies tested (2–32 kHz) in primed animals but only at higher frequencies (4–32 kHz) in unprimed controls. In primed animals, dexamethasone treatment significantly reduced threshold shifts at 2 and 8 kHz. Keyhole limpet hemocyanin led to the more frequent observation of lymphocytes in the tissue response to the implant.
Conclusion
Systemic immune activation at the time of cochlear implantation broadened the range of frequencies experiencing elevated thresholds after implantation. Local dexamethasone provides partial protection against this hearing loss, but the degree and extent of protection are less compared to previous studies with unprimed animals.
doi:10.1097/MAO.0b013e31824bac44
PMCID: PMC3897157  PMID: 22470051
Cochlear implant; Hearing loss; Innate immunity; Keyhole limpet hemocyanin; Systemic immune system
19.  Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information 
BMC Bioinformatics  2014;15(Suppl 1):S2.
Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context.
First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered.
In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.
doi:10.1186/1471-2105-15-S1-S2
PMCID: PMC4015876  PMID: 24564249
20.  A Novel 1,2-Migration of Acyloxy, Phosphatyloxy, and Sulfonyloxy Groups in Allenes: Efficient Synthesis of Tri- and Tetrasubstituted Furans** 
doi:10.1002/anie.200353535
PMCID: PMC3701758  PMID: 15108144
cyclization; furans; homogeneous catalysis; rearrangement; synthetic methods
21.  Metal-Catalyzed 1,2-Shift of Diverse Migrating Groups in Allenyl Systems as a New Paradigm toward Densely Functionalized Heterocycles 
A general, mild, and efficient 1,2-migration/cycloisomerization methodology toward multisubstituted 3-thio-, seleno-, halo-, aryl-, and alkyl-furans and pyrroles, as well as fused heterocycles, valuable building blocks for synthetic chemistry, has been developed. Moreover, regiodivergent conditions have been identified for C-4 bromo- and thio-substituted allenones and alkynones for the assembly of regioisomeric 2-hetero substituted furans selectively. It was demonstrated that, depending on reaction conditions, ambident substrates can be selectively transformed into furan products, as well as undergo selective 6-exo-dig or Nazarov cyclizations. Our mechanistic investigations have revealed that the transformation proceeds via allenylcarbonyl or allenylimine intermediates followed by 1,2-group migration to the allenyl sp carbon during cycloisomerization. It was found that 1,2-migration of chalcogens and halogens predominantly proceeds via formation of irenium intermediates. Analogous intermediate can also be proposed for 1,2-aryl shift. Furthermore, it was shown that the cycloisomerization cascade can be catalyzed by Brønsted acids, albeit less efficiently, and commonly observed reactivity of Lewis acid catalysts cannot be attributed to the eventual formation of proton. Undoubtedly, thermally induced or Lewis acid-catalyzed transformations proceed via intramolecular Michael addition or activation of the enone moiety pathways, whereas certain carbophilic metals trigger carbenoid/oxonium type pathway. However, a facile cycloisomerization in the presence of cationic complexes, as well as observed migratory aptitude in the cycloisomerization of unsymmetrically disubstituted aryl- and alkylallenes, strongly supports electrophilic nature for this transformation. Full mechanistic details, as well as the scope of this transformation, are discussed.
doi:10.1021/ja0773507
PMCID: PMC3686647  PMID: 18173272
23.  A Discriminative Approach for Unsupervised Clustering of DNA Sequence Motifs 
PLoS Computational Biology  2013;9(3):e1002958.
Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC) criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.
Author Summary
Transcription factors play a central role in the regulation of gene expression. Their interaction with specific elements in the DNA mediates dynamic changes in transcriptional activity. Databases store a growing number of known DNA sequence patterns, also denoted as DNA sequence motifs that are recognized by transcription factors. Such databases can be searched to find a match for a newly discovered pattern and that way identify the potential binding factor. It is also of interest to cluster motifs in order to examine which transcription factors have similar binding properties and, thus, may promiscuously bind to each other's sites, or how many distinct specificities have been described. To gain deeper insight into the similarities between DNA sequence motifs, we analyzed a comprehensive set of known motifs. For this purpose we devised a network-based approach that enabled us to identify clusters of related motifs that largely coincided with grouping of related TFs on the basis of protein similarity. On the basis of these results, we were able to predict whether two motifs belong to the same subgroup and constructed a novel, fully-automated method for motif clustering, which enables users to assess the similarity of a newly found motif with all known motifs in the collection.
doi:10.1371/journal.pcbi.1002958
PMCID: PMC3605052  PMID: 23555204
24.  Increased Presence of Cognitive Impairment in Hemodialysis Patients in the Absence of Neurological Events 
American Journal of Nephrology  2011;35(2):120-126.
Background/Aims
Cognitive impairment (CI) is highly prevalent among hemodialysis (HD) patients and is associated with increased morbidity and mortality. The aim was to compare cognitive function in HD patients with no history of stroke or dementia and well-matched controls. Studies are required to determine the impact of HD and chronic kidney disease-specific risks on CI.
Methods
76 outpatients (50 receiving outpatient HD and 26 with normal kidney function matched for age and comorbidity) underwent a cross-sectional observational study. HD patients were well dialyzed and had optimal hemoglobin levels. A battery of eight neuropsychological tests was used. Outcomes included assessment scores of neurocognitive testing and prevalence and subtype of CI.
Results
Compared to controls, HD subjects had significantly lower composite scores for each tested cognitive domain. In each domain except memory, the percentage of subjects with impairment was significantly higher in HD subjects than controls. Differences between the groups were independent of vascular and dementia risk factors. 82% of HD subjects met criteria for CI versus 50% of controls. Non-amnestic subtype of CI was more prevalent in both groups.
Conclusion
Well-dialyzed HD patients with optimized hemoglobin levels and with no history of stroke or dementia performed significantly worse on multiple measures of cognition compared to controls. A higher prevalence of non-memory impairment may suggest an underlying vascular versus neurodegenerative mechanism. HD and chronic kidney disease-specific risk factors may contribute to early CI not readily detected by routine screening methods.
doi:10.1159/000334871
PMCID: PMC3711004  PMID: 22212437
Cognitive impairment; Chronic kidney disease; Hemodialysis
25.  Male predominance of upper gastrointestinal adenocarcinoma cannot be explained by differences in tobacco smoking in men versus women 
Background
Adenocarcinomas of the upper gastrointestinal tract (UGI) show remarkable male predominance. As smoking is a well-established risk factor, we investigated the role of tobacco smoking in the male predominance of UGI adenocarcinomas in the United States NIH-AARP Diet and Health study.
Method
A questionnaire was completed by 281,422 men and 186,133 women in 1995-1996 who were followed until Dec 31, 2003. Incident UGI adenocarcinomas were identified by linkage to state cancer registries. We present age-standardised cancer incidence rates per 100,000 person-years and Male/Female ratios (M/F) calculated from age-adjusted Cox proportional hazards models, both with 95% confidence intervals.
Results
After 2,013,142 person-years follow up, 338 adenocarcinomas of the oesophagus, 261 of gastric cardia, and 222 of gastric non-cardia occurred in men. In women, 23 tumours of oesophagus, 36 of gastric cardia, and 88 of gastric non-cardia occurred in 1,351,958 person-years follow up. The age-standardised incidence rate of all adenocarcinoma sites were 40.5 (37.8-43.3) and 11.0 (9.2-12.8) in men and women. Among smokers, the M/F of all UGI adenocarcinomas was 3.4 (2.7-4.1), with a M/F of 7.3 (4.6-11.7) for tumours in oesophagus, 3.7 (2.5-5.4) for gastric-cardia, and 1.7 (1.2-2.3) for gastric non-cardia. In non-smokers, M/F ratios were 14.2 (5.1-39.5) for oesophagus, 6.1 (2.6-14.7) for gastric cardia, 1.3 (0.8-2.0) for gastric non-cardia. The overall M/F ratio was 3.0 (2.2-4.3).
Conclusion
the male predominance was similar in smokers and non-smokers for these cancer sites. These results suggest that the male predominance of upper GI adenocarcinomas cannot be explained by differences in smoking histories.
doi:10.1016/j.ejca.2010.05.005
PMCID: PMC3514413  PMID: 20605442
oesophageal adenocarcinoma; gastric cancer; male predominance; smoking

Results 1-25 (44)