We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant.
The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.
Many common diseases, such as asthma, diabetes or obesity, involve
altered interactions between thousands of genes. High-throughput techniques (omics)
allow identification of such genes and their products, but functional understanding
is a formidable challenge. Network-based analyses of omics data have identified
modules of disease-associated genes that have been used to obtain both a systems
level and a molecular understanding of disease mechanisms. For example, in allergy a
module was used to find a novel candidate gene that was validated by functional and
clinical studies. Such analyses play important roles in systems medicine. This is an
emerging discipline that aims to gain a translational understanding of the complex
mechanisms underlying common diseases. In this review, we will explain and provide
examples of how network-based analyses of omics data, in combination with functional
and clinical studies, are aiding our understanding of disease, as well as helping to
prioritize diagnostic markers or therapeutic candidate genes. Such analyses involve
significant problems and limitations, which will be discussed. We also highlight the
steps needed for clinical implementation.
Mutations in the gene encoding the RNA-binding protein RBM20 have been implicated in dilated cardiomyopathy (DCM), a major cause of chronic heart failure, presumably through altering cardiac RNA splicing. Here, we combined transcriptome-wide crosslinking immunoprecipitation (CLIP-seq), RNA-seq, and quantitative proteomics in cell culture and rat and human hearts to examine how RBM20 regulates alternative splicing in the heart. Our analyses revealed the presence of a distinct RBM20 RNA-recognition element that is predominantly found within intronic binding sites and linked to repression of exon splicing with RBM20 binding near 3′ and 5′ splice sites. Proteomic analysis determined that RBM20 interacts with both U1 and U2 small nuclear ribonucleic particles (snRNPs) and suggested that RBM20-dependent splicing repression occurs through spliceosome stalling at complex A. Direct RBM20 targets included several genes previously shown to be involved in DCM as well as genes not typically associated with this disease. In failing human hearts, reduced expression of RBM20 affected alternative splicing of several direct targets, indicating that differences in RBM20 expression may affect cardiac function. Together, these findings identify RBM20-regulated targets and provide insight into the pathogenesis of human heart failure.
Post-transcriptional regulatory mechanisms are of fundamental importance to form robust genetic networks, but their roles in stem cell pluripotency remain poorly understood. Here, we use freshwater planarians as a model system to investigate this and uncover a role for CCR4-NOT mediated deadenylation of mRNAs in stem cell differentiation. Planarian adult stem cells, the so-called neoblasts, drive the almost unlimited regenerative capabilities of planarians and allow their ongoing homeostatic tissue turnover. While many genes have been demonstrated to be required for these processes, currently almost no mechanistic insight is available into their regulation. We show that knockdown of planarian Not1, the CCR4-NOT deadenylating complex scaffolding subunit, abrogates regeneration and normal homeostasis. This abrogation is primarily due to severe impairment of their differentiation potential. We describe a stem cell specific increase in the mRNA levels of key neoblast genes after Smed-not1 knock down, consistent with a role of the CCR4-NOT complex in degradation of neoblast mRNAs upon the onset of differentiation. We also observe a stem cell specific increase in the frequency of longer poly(A) tails in these same mRNAs, showing that stem cells after Smed-not1 knock down fail to differentiate as they accumulate populations of transcripts with longer poly(A) tails. As other transcripts are unaffected our data hint at a targeted regulation of these key stem cell mRNAs by post-transcriptional regulators such as RNA-binding proteins or microRNAs. Together, our results show that the CCR4-NOT complex is crucial for stem cell differentiation and controls stem cell-specific degradation of mRNAs, thus providing clear mechanistic insight into this aspect of neoblast biology.
Although transcriptional regulation in stem cells is a very active subject, much less is known about how post-transcriptional mechanisms of gene expression affect stem cells. Here, we use freshwater planarians in order to address this question. Planarians have a striking regenerative capacity driven by a population of pluripotent stem cells, the neoblasts. Control of both proliferation and differentiation is thought to rely heavily on post-transcriptional mechanisms, but their precise role is unknown. Poly(A) tail length regulation is an important mechanism of post-transcriptional control of gene expression as changes can be very rapid, and longer poly(A) tails are linked to increased mRNA stability and translational activity. We investigated the role of the CCR4-NOT complex, the major deadenylating complex in eukaryotes, by knocking down its main scaffolding subunit called Not1. Neoblasts in knock down animals are unable to differentiate and accumulate mRNAs with longer poly(A) tails. Our results show that the CCR4-NOT complex is needed for the targeted degradation of specific mRNAs expressed in stem cells, and the failure of this process likely prevents neoblasts from differentiating. These results reveal a new functional aspect of the CCR4-NOT complex and offer a mechanistic insight into the regulation of planarian stem cells.
The conserved human LIN28 RNA-binding proteins function in development, maintenance of pluripotency and oncogenesis. We used PAR-CLIP and a newly developed variant of this method, iDo-PAR-CLIP, to identify LIN28B targets as well as sites bound by the individual RNA-binding domains of LIN28B in the human transcriptome at nucleotide resolution. The position of target binding sites reflected the known structural relative orientation of individual LIN28B-binding domains, validating iDo-PAR-CLIP. Our data suggest that LIN28B directly interacts with most expressed mRNAs and members of the let-7 microRNA family. The Lin28-binding motif detected in pre-let-7 was enriched in mRNA sequences bound by LIN28B. Upon LIN28B knockdown, cell proliferation and the cell cycle were strongly impaired. Quantitative shotgun proteomics of LIN28B depleted cells revealed significant reduction of protein synthesis from its RNA targets. Computational analyses provided evidence that the strength of protein synthesis reduction correlated with the location of LIN28B binding sites within target transcripts.
post-transcriptional regulation; RNA-binding protein; CLIP; stem cell
microRNAs (miRNAs) are small noncoding RNAs that mediate post-transcriptional gene regulation and have emerged as essential regulators of many developmental events. The transcriptional network during early embryogenesis of the purple sea urchin, Strongylocentrotus purpuratus, is well described and would serve as an excellent model to test functional contributions of miRNAs in embryogenesis. We examined the loss of function phenotypes of the major components of the miRNA biogenesis pathway. Inhibition of de novo synthesis of Drosha and Dicer in the embryo led to consistent developmental defects, a failure to gastrulate, and embryonic lethality, including changes in the steady state levels of transcription factors and signaling molecules involved in germ layer specification. We annotated and profiled small RNA expression from the ovary and several early embryonic stages by deep sequencing followed by computational analysis. All miRNAs have dynamic accumulation profiles through early development as do a large population of putative piRNAs (piwi-interacting RNAs). Defects in morphogenesis caused by loss of Drosha can be rescued with four miRNAs which permits a strong miRNA functional assay. Taken together our results indicate that post-transcriptional gene regulation directed by miRNAs is functionally important for early embryogenesis and is an integral part of the early embryonic gene regulatory network in S. purpuratus.
sea urchin; microRNA; embryogenesis
In animals, RNA binding proteins (RBPs) and microRNAs (miRNAs) post-transcriptionally regulate the expression of virtually all genes by binding to RNA. Recent advances in experimental and computational methods facilitate transcriptome-wide mapping of these interactions. It is thought that the combinatorial action of RBPs and miRNAs on target mRNAs form a post-transcriptional regulatory code. We provide a database that supports the quest for deciphering this regulatory code. Within doRiNA, we are systematically curating, storing and integrating binding site data for RBPs and miRNAs. Users are free to take a target (mRNA) or regulator (RBP and/or miRNA) centric view on the data. We have implemented a database framework with short query response times for complex searches (e.g. asking for all targets of a particular combination of regulators). All search results can be browsed, inspected and analyzed in conjunction with a huge selection of other genome-wide data, because our database is directly linked to a local copy of the UCSC genome browser. At the time of writing, doRiNA encompasses RBP data for the human, mouse and worm genomes. For computational miRNA target site predictions, we provide an update of PicTar predictions.
microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6–99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions. Last, a new miRNA expression profiling routine, low time and memory usage and user-friendly interactive graphic output can make miRDeep2 useful to a wide range of researchers.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Three-prime untranslated regions (3′UTRs) of metazoan messenger RNAs (mRNAs) contain numerous regulatory elements, yet remain largely uncharacterized. Using polyA capture, 3′ rapid amplification of complementary DNA (cDNA) ends, full-length cDNAs, and RNA-seq, we defined ∼26,000 distinct 3′UTRs in Caenorhabditis elegans for ∼85% of the 18,328 experimentally supported protein-coding genes and revised ∼40% of gene models. Alternative 3′UTR isoforms are frequent, often differentially expressed during development. Average 3′UTR length decreases with animal age. Surprisingly, no polyadenylation signal (PAS) was detected for 13% of polyadenylation sites, predominantly among shorter alternative isoforms. Trans-spliced (versus non–trans-spliced) mRNAs possess longer 3′UTRs and frequently contain no PAS or variant PAS. We identified conserved 3′UTR motifs, isoform-specific predicted microRNA target sites, and polyadenylation of most histone genes. Our data reveal a rich complexity of 3′UTRs, both genome-wide and throughout development.
Animal miRNAs are a large class of small regulatory RNAs that are known to directly and negatively regulate the expression of a large fraction of all protein encoding genes. The identification and characterization of miRNA targets is thus a fundamental problem in biology. miRNAs regulate target genes by binding to 3′ untranslated regions (3′UTRs) of target mRNAs, and multiple binding sites for the same miRNA in 3′UTRs can strongly enhance the degree of regulation. Recent experiments have demonstrated that a large fraction of miRNA binding sites reside in coding sequences. Overall, miRNA binding sites in coding regions were shown to mediate smaller regulation than 3′UTR binding. However, possible interactions between target sites in coding sequences and 3′UTRs have not been studied. Using transcriptomics and proteomics data of ten miRNA mis-expression experiments as well as transcriptome-wide experimentally identified miRNA target sites, we found that mRNA and protein expression of genes containing target sites both in coding regions and 3′UTRs were in general mildly but significantly more regulated than those containing target sites in 3′UTRs only. These effects were stronger for conserved target sites of length 7–8 nt in coding regions compared to non-conserved sites. Combined with our other finding that miRNA target sites in coding regions are under negative selection, our results shed light on the functional importance of miRNA targeting in coding regions.
Identifying the nucleotides that cause gene expression variation is a critical step in dissecting the genetic basis of complex traits. Here, we focus on polymorphisms that are predicted to alter transcription factor binding sites (TFBSs) in the yeast, Saccharomyces cerevisiae. We assembled a confident set of transcription factor motifs using recent protein binding microarray and ChIP-chip data and used our collection of motifs to predict a comprehensive set of TFBSs across the S. cerevisiae genome. We used a population genomics analysis to show that our predictions are accurate and significantly improve on our previous annotation. Although predicting gene expression from sequence is thought to be difficult in general, we identified a subset of genes for which changes in predicted TFBSs correlate well with expression divergence between yeast strains. Our analysis thus demonstrates both the accuracy of our new TFBS predictions and the feasibility of using simple models of gene regulation to causally link differences in gene expression to variation at individual nucleotides.
Saccharomyces cerevisiae; transcription factors; transcription factor binding sites; population genetics; gene expression; SNP; eQTL
While more than 700 microRNAs (miRNAs) are known in human, a comparably low number has been identified in swine. Because of the close phylogenetic distance to humans, pigs serve as a suitable model for studying e.g. intestinal development or disease. Recent studies indicate that miRNAs are key regulators of intestinal development and their aberrant expression leads to intestinal malignancy.
Here, we present the identification of hundreds of apparently novel miRNAs in the porcine intestine. MiRNAs were first identified by means of deep sequencing followed by miRNA precursor prediction using the miRDeep algorithm as well as searching for conserved miRNAs. Second, the porcine miRNAome along the entire intestine (duodenum, proximal and distal jejunum, ileum, ascending and transverse colon) was unraveled using customized miRNA microarrays based on the identified sequences as well as known porcine and human ones. In total, the expression of 332 intestinal miRNAs was discovered, of which 201 represented assumed novel porcine miRNAs. The identified hairpin forming precursors were in part organized in genomic clusters, and most of the precursors were located on chromosomes 3 and 1, respectively. Hierarchical clustering of the expression data revealed subsets of miRNAs that are specific to distinct parts of the intestine pointing to their impact on cellular signaling networks.
In this study, we have applied a straight forward approach to decipher the porcine intestinal miRNAome for the first time in mammals using a piglet model. The high number of identified novel miRNAs in the porcine intestine points out their crucial role in intestinal function as shown by pathway analysis. On the other hand, the reported miRNAs may share orthologs in other mammals such as human still to be discovered.
Caenorhabditis elegans is one of the most prominent model systems for embryogenesis. However, it has been impractical to collect large amounts of precisely staged embryos. Thus, early C. elegans embryogenesis has not been amenable to most modern high-throughput genomics or biochemistry assays. To overcome this problem, we devised a method to collect large amounts of staged C. elegans embryos by Fluorescent Activated Cell Sorting (termed eFACS). eFACS can in principle be applied to all embryonic stages. As a proof of principle we show that a single eFACS run routinely yields tens of thousands of almost perfectly staged one-cell embryos. Since the earliest embryonic events are driven by post-transcriptional regulation, we combined eFACS with next-generation sequencing to profile the embryonic expression of small, non-coding RNAs. We discovered complex and orchestrated changes in the expression between and within almost all classes of small RNAs, including miRNAs and 26G-RNAs, during embryogenesis.
Kertesz et al. (Nature Genetics 2008) described PITA, a miRNA target prediction algorithm based on hybridization energy and site accessibility. In this note, we used a population genomics approach to reexamine their data and found that the PITA algorithm had lower specificity than methods based on evolutionary conservation at comparable levels of sensitivity.
We also showed that deeply conserved miRNAs tend to have stronger hybridization energies to their targets than do other miRNAs. Although PITA had higher specificity in predicting targets than a naïve seed-match method, this signal was primarily due to the use of a single cutoff score for all miRNAs and to the observed correlation between conservation and hybridization energy. Overall, our results clarify the accuracy of different miRNA target prediction algorithms in Drosophila and the role of site accessibility in miRNA target prediction.
The first report of systematic miRNA profiling in cells of the hematopoietic system suggests that, in addition to regulating commitment to particular cellular lineages, miRNAs might have a general role in cell differentiation and cell identity.
MicroRNAs (miRNAs) are a class of recently discovered noncoding RNA genes that post-transcriptionally regulate gene expression. It is becoming clear that miRNAs play an important role in the regulation of gene expression during development. However, in mammals, expression data are principally based on whole tissue analysis and are still very incomplete.
We used oligonucleotide arrays to analyze miRNA expression in the murine hematopoietic system. Complementary oligonucleotides capable of hybridizing to 181 miRNAs were immobilized on a membrane and probed with radiolabeled RNA derived from low molecular weight fractions of total RNA from several different hematopoietic and neuronal cells. This method allowed us to analyze cell type-specific patterns of miRNA expression and to identify miRNAs that might be important for cell lineage specification and/or cell effector functions.
This is the first report of systematic miRNA gene profiling in cells of the hematopoietic system. As expected, miRNA expression patterns were very different between hematopoietic and non-hematopoietic cells, with further subtle differences observed within the hematopoietic group. Interestingly, the most pronounced similarities were observed among fully differentiated effector cells (Th1 and Th2 lymphocytes and mast cells) and precursors at comparable stages of differentiation (double negative thymocytes and pro-B cells), suggesting that in addition to regulating the process of commitment to particular cellular lineages, miRNAs might have an important general role in the mechanism of cell differentiation and maintenance of cell identity.
microRNAs are small noncoding genes that regulate the protein production of genes by binding to partially complementary sites in the mRNAs of targeted genes. Here, using our algorithm PicTar, we exploit cross-species comparisons to predict, on average, 54 targeted genes per microRNA above noise in Drosophila melanogaster. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. We also predict combinatorial targets for clustered microRNAs and find that some clustered microRNAs are likely to coordinately regulate target genes. Furthermore, we compare microRNA regulation between insects and vertebrates. We find that the widespread extent of gene regulation by microRNAs is comparable between flies and mammals but that certain microRNAs may function in clade-specific modes of gene regulation. One of these microRNAs (miR-210) is predicted to contribute to the regulation of fly oogenesis. We also list specific regulatory relationships that appear to be conserved between flies and mammals. Our findings provide the most extensive microRNA target predictions in Drosophila to date, suggest specific functional roles for most microRNAs, indicate the existence of coordinate gene regulation executed by clustered microRNAs, and shed light on the evolution of microRNA function across large evolutionary distances. All predictions are freely accessible at our searchable Web site http://pictar.bio.nyu.edu.
MicroRNA genes are a recently discovered large class of small noncoding genes. These genes have been shown to regulate the expression of target genes by binding to partially complementary sites in the mRNAs of the targets. To understand microRNA function it is thus important to identify their targets. Here, the authors use their bioinformatic method, PicTar, and cross-species comparisons of several newly sequenced fly species to predict, genome wide, targets of microRNAs in Drosophila. They find that known fly microRNAs control at least 15% of all genes in D. melanogaster. They also show that genomic clusters of microRNAs are likely to coordinately regulate target genes. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. All predictions are freely accessible at http://pictar.bio.nyu.edu. Finally, Grün et al. compare the function of microRNAs across flies and mammals. They find that (a) the overall extent of microRNA gene regulation is comparable between both clades, (b) the number of targets for a conserved microRNA in flies correlates with the number of targets in mammals, (c) some conserved microRNAs may function in clade-specific modes of gene regulation, and (d) some specific microRNA–target regulatory relationships may be conserved between both clades.
The segmentation gene network of Drosophila consists of maternal and zygotic factors that generate, by transcriptional (cross-) regulation, expression patterns of increasing complexity along the anterior-posterior axis of the embryo. Using known binding site information for maternal and zygotic gap transcription factors, the computer algorithm Ahab recovers known segmentation control elements (modules) with excellent success and predicts many novel modules within the network and genome-wide. We show that novel module predictions are highly enriched in the network and typically clustered proximal to the promoter, not only upstream, but also in intronic space and downstream. When placed upstream of a reporter gene, they consistently drive patterned blastoderm expression, in most cases faithfully producing one or more pattern elements of the endogenous gene. Moreover, we demonstrate for the entire set of known and newly validated modules that Ahab's prediction of binding sites correlates well with the expression patterns produced by the modules, revealing basic rules governing their composition. Specifically, we show that maternal factors consistently act as activators and that gap factors act as repressors, except for the bimodal factor Hunchback. Our data suggest a simple context-dependent rule for its switch from repressive to activating function. Overall, the composition of modules appears well fitted to the spatiotemporal distribution of their positive and negative input factors. Finally, by comparing Ahab predictions with different categories of transcription factor input, we confirm the global regulatory structure of the segmentation gene network, but find odd skipped behaving like a primary pair-rule gene. The study expands our knowledge of the segmentation gene network by increasing the number of experimentally tested modules by 50%. For the first time, the entire set of validated modules is analyzed for binding site composition under a uniform set of criteria, permitting the definition of basic composition rules. The study demonstrates that computational methods are a powerful complement to experimental approaches in the analysis of transcription networks.
Starting with known transcription binding site information these researchers use a the computer algorithm, Ahab, to recover known control elements and find novel modules within the genome
We present a new computational method to identify microRNA target sites that incorporates both kinetic and thermodynamic components of target recognition.
Recent experiments have shown that the genomes of organisms such as worm, fly, human and mouse encode hundreds of microRNA genes. Many of these microRNAs are thought to regulate the translational expression of other genes by binding to partially complementary sites in messenger RNAs. Phenotypic and expression analysis suggest an important role of microRNAs during development. Therefore, it is of fundamental importance to identify microRNA targets. However, no experimental or computational high-throughput method for target site identification in animals has been published yet. Our main result is a new computational method which is designed to identify microRNA target sites. This method recovers with high specificity known microRNA target sites which previously have been defined experimentally. Based on these results, we present a simple model for the mechanism of microRNA target site recognition. Our model incorporates both kinetic and thermodynamic components of target recognition. When we applied our method to a set of 74 Drosophila melanogaster microRNAs, searching 3' UTR sequences of a predefined set of fly mRNAs for target sites which were evolutionary conserved between Drosophila melanogaster and Drosophila pseudoobscura, we found that a number of key developmental body patterning genes such as hairy and fushi-tarazu are likely to be translationally regulated by microRNAs.
One of the important goals in the post-genomic era is to determine the regulatory elements within the non-coding DNA of a given organism's genome. The identification of functional cis-regulatory modules has proven difficult since the component factor binding sites are small and the rules governing their arrangement are poorly understood. However, the genomes of suitably diverged species help to predict regulatory elements based on the generally accepted assumption that conserved blocks of genomic sequence are likely to be functional. To judge the efficacy of strategies that prefilter by sequence conservation it is important to know to what extent the converse assumption holds, namely that functional elements common to both species will fall within these conserved blocks. The recently completed sequence of a second Drosophila species provides an opportunity to test this assumption for one of the experimentally best studied regulatory networks in multicellular organisms, the body patterning of the fly embryo.
We find that 50%–70% of known binding sites reside in conserved sequence blocks, but these percentages are not greatly enriched over what is expected by chance. Finally, a computational genome-wide search in both species for regulatory modules based on clusters of binding sites suggests that genes central to the regulatory network are consistently recovered.
Our results indicate that binding sites remain clustered for these "core modules" while not necessarily residing in conserved blocks. This is an important clue as to how regulatory information is encoded in the genome and how modules evolve.
Regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence are used with remarkable success in the annotation of genomes, the development of computational methods to analyze noncoding regions and to delineate transcriptional control elements is still in its infancy.
Here we present novel algorithms to detect cis-regulatory modules through genome wide scans for clusters of transcription factor binding sites using three levels of prior information. When binding sites for the factors are known, our statistical segmentation algorithm, Ahab, yields about 150 putative gap gene regulated modules, with no adjustable parameters other than a window size. If one or more related modules are known, but no binding sites, repeated motifs can be found by a customized Gibbs sampler and input to Ahab, to predict genes with similar regulation. Finally using only the genome, we developed a third algorithm, Argos, that counts and scores clusters of overrepresented motifs in a window of sequence. Argos recovers many of the known modules, upstream of the segmentation genes, with no training data.
We have demonstrated, in the case of body patterning in the Drosophila embryo, that our algorithms allow the genome-wide identification of regulatory modules. We believe that Ahab overcomes many problems of recent approaches and we estimated the false positive rate to be about 50%. Argos is the first successful attempt to predict regulatory modules using only the genome without training data. Complete results and module predictions across the Drosophila genome are available at http://uqbar.rockefeller.edu/~siggia/.