Search tips
Search criteria

Results 1-25 (25)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Transcriptional and epigenetic signatures of zygotic genome activation during early drosophila embryogenesis 
BMC Genomics  2013;14:226.
In all Metazoa, transcription is inactive during the first mitotic cycles after fertilisation. In Drosophila melanogaster, Zygotic Genome Activation (ZGA) occurs in two waves, starting respectively at mitotic cycles 8 (approximately 60 genes) and 14 (over a thousand genes). The regulatory mechanisms underlying these drastic transcriptional changes remain largely unknown.
We developed an original gene clustering method based on discretized transition profiles, and applied it to datasets from three landmark early embryonic transcriptome studies. We identified 417 genes significantly up-regulated during ZGA. De novo motif discovery returned nine motifs over-represented in their non-coding sequences (upstream, introns, UTR), three of which correspond to previously known transcription factors: Zelda, Tramtrack and Trithorax-like (Trl). The nine discovered motifs were combined to scan ZGA-associated regions and predict about 1300 putative cis-regulatory modules. The fact that Trl is known to act as chromatin remodelling factor suggests that epigenetic regulation might play an important role in zygotic genome activation. We thus systematically compared the locations of predicted CRMs with ChIP-seq profiles for various transcription factors, 38 epigenetic marks from ModENCODE, and DNAse1 accessibility profiles. This analysis highlighted a strong and specific enrichment of predicted ZGA-associated CRMs for Zelda, CBP, Trl binding sites, as well as for histone marks associated with active enhancers (H3K4me1) and for open chromatin regions.
Based on the results of our computational analyses, we suggest a temporal model explaining the onset of zygotic genome activation by the combined action of transcription factors and epigenetic signals. Although this study is mainly based on the analysis of publicly available transcriptome and ChiP-seq datasets, the resulting model suggests novel mechanisms that underly the coordinated activation of several hundreds genes at a precise time point during embryonic development.
PMCID: PMC3706223  PMID: 23560912
Drosophila Melanogaster; Zygotic Genome Activation; Transcriptional Regulation; Epigenetic Regulation; Transcriptome; ChIP-seq
2.  Correction: Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype 
PLoS ONE  2012;7(1):10.1371/annotation/a91571a6-acbb-456f-bcc5-f4a431e28516.
PMCID: PMC3267642
3.  Correction: Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype 
PLoS ONE  2012;7(1):10.1371/annotation/4aae21a9-e176-4feb-9f15-103b265d3335.
PMCID: PMC3267643
4.  Correction: Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype 
PLoS ONE  2012;7(1):10.1371/annotation/7aa0ff33-5660-4b56-889a-4b86a273d522.
PMCID: PMC3267644
5.  Correction: Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype 
PLoS ONE  2012;7(1):10.1371/annotation/13c6d084-a8fd-4019-a3cf-12a0d8abe309.
PMCID: PMC3267645
6.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets 
Nucleic Acids Research  2011;40(4):e31.
ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1 28 000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.
PMCID: PMC3287167  PMID: 22156162
7.  Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype 
PLoS ONE  2011;6(9):e24134.
Background and Methodology
The aim of this study was to establish a gene expression blueprint of pancreatic beta cells conserved from rodents to humans and to evaluate its applicability to assess shifts in the beta cell differentiated state. Genome-wide mRNA expression profiles of isolated beta cells were compared to those of a large panel of other tissue and cell types, and transcripts with beta cell-abundant and -selective expression were identified. Iteration of this analysis in mouse, rat and human tissues generated a panel of conserved beta cell biomarkers. This panel was then used to compare isolated versus laser capture microdissected beta cells, monitor adaptations of the beta cell phenotype to fasting, and retrieve possible conserved transcriptional regulators.
Principal Findings
A panel of 332 conserved beta cell biomarker genes was found to discriminate both isolated and laser capture microdissected beta cells from all other examined cell types. Of all conserved beta cell-markers, 15% were strongly beta cell-selective and functionally associated to hormone processing, 15% were shared with neuronal cells and associated to regulated synaptic vesicle transport and 30% with immune plus gut mucosal tissues reflecting active protein synthesis. Fasting specifically down-regulated the latter cluster, but preserved the neuronal and strongly beta cell-selective traits, indicating preserved differentiated state. Analysis of consensus binding site enrichment indicated major roles of CREB/ATF and various nutrient- or redox-regulated transcription factors in maintenance of differentiated beta cell phenotype.
Conserved beta cell marker genes contain major gene clusters defined by their beta cell selectivity or by their additional abundance in either neural cells or in immune plus gut mucosal cells. This panel can be used as a template to identify changes in the differentiated state of beta cells.
PMCID: PMC3166300  PMID: 21912665
8.  RSAT 2011: regulatory sequence analysis tools 
Nucleic Acids Research  2011;39(Web Server issue):W86-W91.
RSAT (Regulatory Sequence Analysis Tools) comprises a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. Thirteen new programs have been added to the 30 described in the 2008 NAR Web Software Issue, including an automated sequence retrieval from EnsEMBL (retrieve-ensembl-seq), two novel motif discovery algorithms (oligo-diff and info-gibbs), a 100-times faster version of matrix-scan enabling the scanning of genome-scale sequence sets, and a series of facilities for random model generation and statistical evaluation (random-genome-fragments, random-motifs, random-sites, implant-sites, sequence-probability, permute-matrix). Our most recent work also focused on motif comparison (compare-matrices) and evaluation of motif quality (matrix-quality) by combining theoretical and empirical measures to assess the predictive capability of position-specific scoring matrices. To process large collections of peak sequences obtained from ChIP-seq or related technologies, RSAT provides a new program (peak-motifs) that combines several efficient motif discovery algorithms to predict transcription factor binding motifs, match them against motif databases and predict their binding sites. Availability (web site, stand-alone programs and SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services):
PMCID: PMC3125777  PMID: 21715389
9.  Unraveling networks of co-regulated genes on the sole basis of genome sequences 
Nucleic Acids Research  2011;39(15):6340-6358.
With the growing number of available microbial genome sequences, regulatory signals can now be revealed as conserved motifs in promoters of orthologous genes (phylogenetic footprints). A next challenge is to unravel genome-scale regulatory networks. Using as sole input genome sequences, we predicted cis-regulatory elements for each gene of the yeast Saccharomyces cerevisiae by discovering over-represented motifs in the promoters of their orthologs in 19 Saccharomycetes species. We then linked all genes displaying similar motifs in their promoter regions and inferred a co-regulation network including 56 919 links between 3171 genes. Comparison with annotated regulons highlights the high predictive value of the method: a majority of the top-scoring predictions correspond to already known co-regulations. We also show that this inferred network is as accurate as a co-expression network built from hundreds of transcriptome microarray experiments. Furthermore, we experimentally validated 14 among 16 new functional links between orphan genes and known regulons. This approach can be readily applied to unravel gene regulatory networks from hundreds of microbial genomes for which no other information is available except the sequence. Long-term benefits can easily be perceived when considering the exponential increase of new genome sequences.
PMCID: PMC3159452  PMID: 21572103
10.  Theoretical and empirical quality assessment of transcription factor-binding motifs 
Nucleic Acids Research  2010;39(3):808-824.
Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program ‘matrix-quality’, that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied ‘matrix-quality’ to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP–seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.
PMCID: PMC3035439  PMID: 20923783
11.  Pathway discovery in metabolic networks by subgraph extraction 
Bioinformatics  2010;26(9):1211-1218.
Motivation: Subgraph extraction is a powerful technique to predict pathways from biological networks and a set of query items (e.g. genes, proteins, compounds, etc.). It can be applied to a variety of different data types, such as gene expression, protein levels, operons or phylogenetic profiles. In this article, we investigate different approaches to extract relevant pathways from metabolic networks. Although these approaches have been adapted to metabolic networks, they are generic enough to be adjusted to other biological networks as well.
Results: We comparatively evaluated seven sub-network extraction approaches on 71 known metabolic pathways from Saccharomyces cerevisiae and a metabolic network obtained from MetaCyc. The best performing approach is a novel hybrid strategy, which combines a random walk-based reduction of the graph with a shortest paths-based algorithm, and which recovers the reference pathways with an accuracy of ∼77%.
Availability: Most of the presented algorithms are available as part of the network analysis tool set (NeAT). The kWalks method is released under the GPL3 license.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2859126  PMID: 20228128
12.  Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data 
Nucleic Acids Research  2010;38(11):e120.
Deciphering transcription factor networks from microarray data remains difficult. This study presents a simple method to infer the regulation of transcription factors from microarray data based on well-characterized target genes. We generated a catalog containing transcription factors associated with 2720 target genes and 6401 experimentally validated regulations. When it was available, a distinction between transcriptional activation and inhibition was included for each regulation. Next, we built a tool ( that compares submitted gene lists with target genes in the catalog to detect regulated transcription factors. TFactS was validated with published lists of regulated genes in various models and compared to tools based on in silico promoter analysis. We next analyzed the NCI60 cancer microarray data set and showed the regulation of SOX10, MITF and JUN in melanomas. We then performed microarray experiments comparing gene expression response of human fibroblasts stimulated by different growth factors. TFactS predicted the specific activation of Signal transducer and activator of transcription factors by PDGF-BB, which was confirmed experimentally. Our results show that the expression levels of transcription factor target genes constitute a robust signature for transcription factor regulation, and can be efficiently used for microarray data mining.
PMCID: PMC2887972  PMID: 20215436
13.  Integrating sequence, evolution and functional genomics in regulatory genomics 
Genome Biology  2009;10(1):202.
Finding transcription factor binding sites in regulatory regions of the genome
With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.
PMCID: PMC2687781  PMID: 19226437
14.  Machine learning techniques to identify putative genes involved in nitrogen catabolite repression in the yeast Saccharomyces cerevisiae 
BMC Proceedings  2008;2(Suppl 4):S5.
Nitrogen is an essential nutrient for all life forms. Like most unicellular organisms, the yeast Saccharomyces cerevisiae transports and catabolizes good nitrogen sources in preference to poor ones. Nitrogen catabolite repression (NCR) refers to this selection mechanism. All known nitrogen catabolite pathways are regulated by four regulators. The ultimate goal is to infer the complete nitrogen catabolite pathways. Bioinformatics approaches offer the possibility to identify putative NCR genes and to discard uninteresting genes.
We present a machine learning approach where the identification of putative NCR genes in the yeast Saccharomyces cerevisiae is formulated as a supervised two-class classification problem. Classifiers predict whether genes are NCR-sensitive or not from a large number of variables related to the GATA motif in the upstream non-coding sequences of the genes. The positive and negative training sets are composed of annotated NCR genes and manually-selected genes known to be insensitive to NCR, respectively. Different classifiers and variable selection methods are compared. We show that all classifiers make significant and biologically valid predictions by comparing these predictions to annotated and putative NCR genes, and by performing several negative controls. In particular, the inferred NCR genes significantly overlap with putative NCR genes identified in three genome-wide experimental and bioinformatics studies.
These results suggest that our approach can successfully identify potential NCR genes. Hence, the dimensionality of the problem of identifying all genes involved in NCR is drastically reduced.
PMCID: PMC2654973  PMID: 19091052
15.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures 
Nature  2007;450(7167):219-232.
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
PMCID: PMC2474711  PMID: 17994088
16.  NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways 
Nucleic Acids Research  2008;36(Web Server issue):W444-W451.
The network analysis tools (NeAT) ( provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
PMCID: PMC2447721  PMID: 18524799
17.  RSAT: regulatory sequence analysis tools 
Nucleic Acids Research  2008;36(Web Server issue):W119-W127.
The regulatory sequence analysis tools (RSAT, is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
PMCID: PMC2447775  PMID: 18495751
18.  Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution 
BMC Bioinformatics  2008;9:37.
The detection of conserved motifs in promoters of orthologous genes (phylogenetic footprints) has become a common strategy to predict cis-acting regulatory elements. Several software tools are routinely used to raise hypotheses about regulation. However, these tools are generally used as black boxes, with default parameters. A systematic evaluation of optimal parameters for a footprint discovery strategy can bring a sizeable improvement to the predictions.
We evaluate the performances of a footprint discovery approach based on the detection of over-represented spaced motifs. This method is particularly suitable for (but not restricted to) Bacteria, since such motifs are typically bound by factors containing a Helix-Turn-Helix domain. We evaluated footprint discovery in 368 Escherichia coli K12 genes with annotated sites, under 40 different combinations of parameters (taxonomical level, background model, organism-specific filtering, operon inference). Motifs are assessed both at the levels of correctness and significance. We further report a detailed analysis of 181 bacterial orthologs of the LexA repressor. Distinct motifs are detected at various taxonomical levels, including the 7 previously characterized taxon-specific motifs. In addition, we highlight a significantly stronger conservation of half-motifs in Actinobacteria, relative to Firmicutes, suggesting an intermediate state in specificity switching between the two Gram-positive phyla, and thereby revealing the on-going evolution of LexA auto-regulation.
The footprint discovery method proposed here shows excellent results with E. coli and can readily be extended to predict cis-acting regulatory signals and propose testable hypotheses in bacterial genomes for which nothing is known about regulation.
PMCID: PMC2248561  PMID: 18215291
19.  Fine-Tuning Enhancer Models to Predict Transcriptional Targets across Multiple Genomes 
PLoS ONE  2007;2(11):e1115.
Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks.
PMCID: PMC2047340  PMID: 17973026
20.  Effect of 21 Different Nitrogen Sources on Global Gene Expression in the Yeast Saccharomyces cerevisiae▿ †  
Molecular and Cellular Biology  2007;27(8):3065-3086.
We compared the transcriptomes of Saccharomyces cerevisiae cells growing under steady-state conditions on 21 unique sources of nitrogen. We found 506 genes differentially regulated by nitrogen and estimated the activation degrees of all identified nitrogen-responding transcriptional controls according to the nitrogen source. One main group of nitrogenous compounds supports fast growth and a highly active nitrogen catabolite repression (NCR) control. Catabolism of these compounds typically yields carbon derivatives directly assimilable by a cell's metabolism. Another group of nitrogen compounds supports slower growth, is associated with excretion by cells of nonmetabolizable carbon compounds such as fusel oils, and is characterized by activation of the general control of amino acid biosynthesis (GAAC). Furthermore, NCR and GAAC appear interlinked, since expression of the GCN4 gene encoding the transcription factor that mediates GAAC is subject to NCR. We also observed that several transcriptional-regulation systems are active under a wider range of nitrogen supply conditions than anticipated. Other transcriptional-regulation systems acting on genes not involved in nitrogen metabolism, e.g., the pleiotropic-drug resistance and the unfolded-protein response systems, also respond to nitrogen. We have completed the lists of target genes of several nitrogen-sensitive regulons and have used sequence comparison tools to propose functions for about 20 orphan genes. Similar studies conducted for other nutrients should provide a more complete view of alternative metabolic pathways in yeast and contribute to the attribution of functions to many other orphan genes.
PMCID: PMC1899933  PMID: 17308034
21.  In silico identification of NF-kappaB-regulated genes in pancreatic beta-cells 
BMC Bioinformatics  2007;8:55.
Pancreatic beta-cells are the target of an autoimmune attack in type 1 diabetes mellitus (T1DM). This is mediated in part by cytokines, such as interleukin (IL)-1β and interferon (IFN)-γ. These cytokines modify the expression of hundreds of genes, leading to beta-cell dysfunction and death by apoptosis. Several of these cytokine-induced genes are potentially regulated by the IL-1β-activated transcription factor (TF) nuclear factor (NF)-κB, and previous studies by our group have shown that cytokine-induced NF-κB activation is pro-apoptotic in beta-cells. To identify NF-κB-regulated gene networks in beta-cells we presently used a discriminant analysis-based approach to predict NF-κB responding genes on the basis of putative regulatory elements.
The performance of linear and quadratic discriminant analysis (LDA, QDA) in identifying NF-κB-responding genes was examined on a dataset of 240 positive and negative examples of NF-κB regulation, using stratified cross-validation with an internal leave-one-out cross-validation (LOOCV) loop for automated feature selection and noise reduction. LDA performed slightly better than QDA, achieving 61% sensitivity, 91% specificity and 87% positive predictive value, and allowing the identification of 231, 251 and 580 NF-κB putative target genes in insulin-producing INS-1E cells, primary rat beta-cells and human pancreatic islets, respectively. Predicted NF-κB targets had a significant enrichment in genes regulated by cytokines (IL-1β or IL-1β + IFN-γ) and double stranded RNA (dsRNA), as compared to genes not regulated by these NF-κB-dependent stimuli. We increased the confidence of the predictions by selecting only evolutionary stable genes, i.e. genes with homologs predicted as NF-κB targets in rat, mouse, human and chimpanzee.
The present in silico analysis allowed us to identify novel regulatory targets of NF-κB using a supervised classification method based on putative binding motifs. This provides new insights into the gene networks regulating cytokine-induced beta-cell dysfunction and death.
PMCID: PMC1810323  PMID: 17302974
22.  Evaluation of clustering algorithms for protein-protein interaction networks 
BMC Bioinformatics  2006;7:488.
Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism). In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies). High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions.
The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Super Paramagnetic Clustering (SPC), and Molecular Complex Detection (MCODE).
A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions.
Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes.
We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values.
We also evaluated their robustness to alterations of the test graph.
We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes.
This analysis shows that MCL is remarkably robust to graph alterations. In the tests of robustness, RNSC is more sensitive to edge deletion but less sensitive to the use of suboptimal parameter values. The other two algorithms are clearly weaker under most conditions.
The analysis of high-throughput data supports the superiority of MCL for the extraction of complexes from interaction networks.
PMCID: PMC1637120  PMID: 17087821
23.  Metabolic PathFinding: inferring relevant pathways in biochemical networks 
Nucleic Acids Research  2005;33(Web Server issue):W326-W330.
Our knowledge of metabolism can be represented as a network comprising several thousands of nodes (compounds and reactions). Several groups applied graph theory to analyse the topological properties of this network and to infer metabolic pathways by path finding. This is, however, not straightforward, with a major problem caused by traversing irrelevant shortcuts through highly connected nodes, which correspond to pool metabolites and co-factors (e.g. H2O, NADP and H+). In this study, we present a web server implementing two simple approaches, which circumvent this problem, thereby improving the relevance of the inferred pathways. In the simplest approach, the shortest path is computed, while filtering out the selection of highly connected compounds. In the second approach, the shortest path is computed on the weighted metabolic graph where each compound is assigned a weight equal to its connectivity in the network. This approach significantly increases the accuracy of the inferred pathways, enabling the correct inference of relatively long pathways (e.g. with as many as eight intermediate reactions). Available options include the calculation of the k-shortest paths between two specified seed nodes (either compounds or reactions). Multiple requests can be submitted in a queue. Results are returned by email, in textual as well as graphical formats (available in ).
PMCID: PMC1160198  PMID: 15980483
24.  Transcriptional regulation of protein complexes in yeast 
Genome Biology  2004;5(5):R33.
This study shows that only a small fraction of yeast protein complexes are coregulated at the transcriptional level.
Multiprotein complexes play an essential role in many cellular processes. But our knowledge of the mechanism of their formation, regulation and lifetimes is very limited. We investigated transcriptional regulation of protein complexes in yeast using two approaches. First, known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes. The complexes comprised manually curated ones and those characterized by high-throughput analyses. Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs.
Only a very small fraction of the analyzed complexes (5-6%) have subsets of their components mapping onto known regulons. Likewise, regulatory motifs are detected in only about 8-15% of the complexes, and in those, about half of the components are on average part of predicted regulons. In the manually curated complexes, the so-called 'permanent' assemblies have a larger fraction of their components belonging to putative regulons than 'transient' complexes. For the noisier set of complexes identified by high-throughput screens, valuable insights are obtained into the function and regulation of individual genes.
A small fraction of the known multiprotein complexes in yeast seems to have at least a subset of their components co-regulated on the transcriptional level. Preliminary analysis of the regulatory motifs for these components suggests that the corresponding genes are likely to be co-regulated either together or in smaller subgroups, indicating that transcriptionally regulated modules might exist within complexes.
PMCID: PMC416469  PMID: 15128447
25.  Regulatory Sequence Analysis Tools 
Nucleic Acids Research  2003;31(13):3593-3596.
The web resource Regulatory Sequence Analysis Tools (RSAT) ( offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
PMCID: PMC168973  PMID: 12824373

Results 1-25 (25)