Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE 
BMC Genomics  2013;14:494.
Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.
In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (, on the public Amazon Cloud (, and on the private Bionimbus Cloud for genomic research ( In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.
Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.
PMCID: PMC3734164  PMID: 23875683
2.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(Database issue):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database ( described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
PMCID: PMC3245176  PMID: 22080565
3.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details 
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at
Database URL:
PMCID: PMC3170170  PMID: 21856757
4.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project 
Gerstein, Mark B. | Lu, Zhi John | Van Nostrand, Eric L. | Cheng, Chao | Arshinoff, Bradley I. | Liu, Tao | Yip, Kevin Y. | Robilotto, Rebecca | Rechtsteiner, Andreas | Ikegami, Kohta | Alves, Pedro | Chateigner, Aurelien | Perry, Marc | Morris, Mitzi | Auerbach, Raymond K. | Feng, Xin | Leng, Jing | Vielle, Anne | Niu, Wei | Rhrissorrakrai, Kahn | Agarwal, Ashish | Alexander, Roger P. | Barber, Galt | Brdlik, Cathleen M. | Brennan, Jennifer | Brouillet, Jeremy Jean | Carr, Adrian | Cheung, Ming-Sin | Clawson, Hiram | Contrino, Sergio | Dannenberg, Luke O. | Dernburg, Abby F. | Desai, Arshad | Dick, Lindsay | Dosé, Andréa C. | Du, Jiang | Egelhofer, Thea | Ercan, Sevinc | Euskirchen, Ghia | Ewing, Brent | Feingold, Elise A. | Gassmann, Reto | Good, Peter J. | Green, Phil | Gullier, Francois | Gutwein, Michelle | Guyer, Mark S. | Habegger, Lukas | Han, Ting | Henikoff, Jorja G. | Henz, Stefan R. | Hinrichs, Angie | Holster, Heather | Hyman, Tony | Iniguez, A. Leo | Janette, Judith | Jensen, Morten | Kato, Masaomi | Kent, W. James | Kephart, Ellen | Khivansara, Vishal | Khurana, Ekta | Kim, John K. | Kolasinska-Zwierz, Paulina | Lai, Eric C. | Latorre, Isabel | Leahey, Amber | Lewis, Suzanna | Lloyd, Paul | Lochovsky, Lucas | Lowdon, Rebecca F. | Lubling, Yaniv | Lyne, Rachel | MacCoss, Michael | Mackowiak, Sebastian D. | Mangone, Marco | McKay, Sheldon | Mecenas, Desirea | Merrihew, Gennifer | Miller, David M. | Muroyama, Andrew | Murray, John I. | Ooi, Siew-Loon | Pham, Hoang | Phippen, Taryn | Preston, Elicia A. | Rajewsky, Nikolaus | Rätsch, Gunnar | Rosenbaum, Heidi | Rozowsky, Joel | Rutherford, Kim | Ruzanov, Peter | Sarov, Mihail | Sasidharan, Rajkumar | Sboner, Andrea | Scheid, Paul | Segal, Eran | Shin, Hyunjin | Shou, Chong | Slack, Frank J. | Slightam, Cindie | Smith, Richard | Spencer, William C. | Stinson, E. O. | Taing, Scott | Takasaki, Teruaki | Vafeados, Dionne | Voronina, Ksenia | Wang, Guilin | Washington, Nicole L. | Whittle, Christina M. | Wu, Beijing | Yan, Koon-Kiu | Zeller, Georg | Zha, Zheng | Zhong, Mei | Zhou, Xingliang | Ahringer, Julie | Strome, Susan | Gunsalus, Kristin C. | Micklem, Gos | Liu, X. Shirley | Reinke, Valerie | Kim, Stuart K. | Hillier, LaDeana W. | Henikoff, Steven | Piano, Fabio | Snyder, Michael | Stein, Lincoln | Lieb, Jason D. | Waterston, Robert H.
Science (New York, N.Y.)  2010;330(6012):1775-1787.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
PMCID: PMC3142569  PMID: 21177976
5.  ELT-2 Is the Predominant Transcription Factor Controlling Differentiation and Function of the C. elegans Intestine, from Embryo to Adult 
Developmental biology  2008;327(2):551-565.
Starting with SAGE-libraries prepared from C. elegans FAC-sorted embryonic intestine cells (8E-16E cell stage), from total embryos and from purified oocytes, and taking advantage of the NextDB in situ hybridization data base, we define sets of genes highly expressed from the zygotic genome, and expressed either exclusively or preferentially in the embryonic intestine or in the intestine of newly hatched larvae; we had previously defined a similarly expressed set of genes from the adult intestine. We show that an extended TGATAA-like sequence is essentially the only candidate for a cis-acting regulatory motif common to intestine genes expressed at all stages. This sequence is a strong ELT-2 binding site and matches the sequence of GATA-like sites found to be important for the expression of every intestinal gene so far analyzed experimentally. We show that the majority of these three sets of highly expressed intestinal-specific/intestinal-enriched genes respond strongly to ectopic expression of ELT-2 within the embryo. By flow-sorting elt-2(null) larvae from elt-2(+) larvae and then preparing Solexa/Illumina-SAGE libraries, we show that the majority of these genes also respond strongly to loss-of-function of ELT-2. To test the consequences of loss of other transcription factors identified in the embryonic intestine, we develop a strain of worms that is RNAi-sensitive only in the intestine; however, we are unable (with one possible exception) to identify any other transcription factor whose intestinal loss-of-function causes a phenotype of comparable severity to the phenotype caused by loss of ELT-2. Overall, our results support a model in which ELT-2 is the predominant transcription factor in the post-specification C. elegans intestine and participates directly in the transcriptional regulation of the majority (> 80%) of intestinal genes. We present evidence that ELT-2 plays a central role in most aspects of C. elegans intestinal physiology: establishing the structure of the enterocyte, regulating enzymes and transporters involved in digestion and nutrition, responding to environmental toxins and pathogenic infections, and regulating the downstream intestinal components of the daf-2/daf-16 pathway influencing aging and longevity.
PMCID: PMC2706090  PMID: 19111532
6.  Deep SAGE analysis of the Caenorhabditis elegans transcriptome 
Nucleic Acids Research  2010;38(10):3252-3262.
We employed the Tag-seq technique to generate global transcription profiles for different strains and life stages of the nematode C. elegans. Tag-seq generates cDNA tags as does Serial Analysis of Gene Expression (SAGE), but the method yields a much larger number of tags, generating much larger data sets than SAGE. We examined differences in the performance of SAGE and Tag-seq by comparing gene expression data for 13 pairs of libraries. We identified genes for which expression was consistently changed in long-lived worms. Additional genes emerged in the deeper Tag-seq profiles, including several ‘signature’ genes found among those zup-regulated in long-lived dauer larvae (cki-1, aak-2 and daf-16). Fifty to sixty percent of the genes differentially expressed in daf-2(−) versus daf-2(+) adults had fragmentary or no functional annotation, suggesting the involvement of as yet unstudied pathways in aging. We were able to distinguish between changes in gene expression associated with altered genotype or altered growth conditions. We found 62 cases of possible mRNA isoform switching in the 13 Tag-seq libraries, whereas the 13 SAGE libraries allowed detection of only 15 such occurrences. We observed strong expression of anti-sense transcripts for several mitochondrial genes, but nuclear anti-sense transcripts were neither abundant nor consistently expressed among the libraries.
PMCID: PMC2879516  PMID: 20129939
7.  Genes that may modulate longevity in C. elegans in both dauer larvae and long-lived daf-2 adults 
Experimental gerontology  2007;42(8):825-839.
We used Serial Analysis of Gene Expression (SAGE) to compare the global transcription profiles of long-lived mutant daf-2 adults and dauer larvae, aiming to identify aging-related genes based on similarity of expression patterns. Genes that are expressed similarly in both long-lived types potentially define a common life-extending program. Comparison of eight SAGE libraries yielded a set of 120 genes, the expression of which was significantly different in long-lived worms versus normal adults. The gene annotations indicate a strong link between oxidative stress and life span, further supporting the hypothesis that metabolic activity is a major determinant in longevity. The SAGE data show changes in mRNA levels for electron transport chain components, elevated expression of glyoxylate shunt enzymes and significantly reduced expression for components of the TCA cycle in longer-lived nematodes. We propose a model for enhanced longevity through a cytochrome c oxidase-mediated reduction in reactive oxygen species commonly held to be a major contributor to aging.
PMCID: PMC2755518  PMID: 17543485
aging; C. elegans; gene expression; longevity; reactive oxygen species
8.  Discovery of novel alternatively spliced C. elegans transcripts by computational analysis of SAGE data 
BMC Genomics  2007;8:447.
Alternative RNA splicing allows cells to produce multiple protein isoforms from one gene. These isoforms may have specialized functions, and may be tissue- or stage-specific. Our aim was to use computational analysis of SAGE and genomic data to predict alternatively spliced transcripts expressed in C. elegans.
We predicted novel alternatively spliced variants and confirmed five of eighteen candidates selected for experimental validation by RT-PCR tests and DNA sequencing.
We show that SAGE data can be efficiently used to discover alternative mRNA isoforms, including those with skipped exons or retained introns. Our results also imply that C. elegans may produce a larger number of alternatively spliced transcripts than initially estimated.
PMCID: PMC2216036  PMID: 18053145
9.  Akt-Mediated YB-1 Phosphorylation Activates Translation of Silent mRNA Species 
Molecular and Cellular Biology  2006;26(1):277-292.
YB-1 is a broad-specificity RNA-binding protein that is involved in regulation of mRNA transcription, splicing, translation, and stability. In both germinal and somatic cells, YB-1 and related proteins are major components of translationally inactive messenger ribonucleoprotein particles (mRNPs) and are mainly responsible for storage of mRNAs in a silent state. However, mechanisms regulating the repressor activity of YB-1 are not well understood. Here we demonstrate that association of YB-1 with the capped 5′ terminus of the mRNA is regulated via phosphorylation by the serine/threonine protein kinase Akt. In contrast to its nonphosphorylated form, phosphorylated YB-1 fails to inhibit cap-dependent but not internal ribosome entry site-dependent translation of a reporter mRNA in vitro. We also show that similar to YB-1, Akt is associated with inactive mRNPs and that activated Akt may relieve translational repression of the YB-1-bound mRNAs. Using Affymetrix microarrays, we found that many of the YB-1-associated messages encode stress- and growth-related proteins, raising the intriguing possibility that Akt-mediated YB-1 phosphorylation could, in part, increase production of proteins regulating cell proliferation, oncogenic transformation, and stress response.
PMCID: PMC1317623  PMID: 16354698
10.  An interactive tool for visualization of relationships between gene expression profiles 
BMC Bioinformatics  2006;7:193.
Application of phenetic methods to gene expression analysis proved to be a successful approach. Visualizing the results in a 3-dimentional space may further enhance these techniques.
We designed and built TreeBuilder3D, an interactive viewer for visualizing the hierarchical relationships between expression profiles such as SAGE libraries or microarrays. The program allows loading expression data as plain text files and visualizing the relative differences of the analyzed datasets in 3-dimensional space using various distance metrics.
TreeBuilder3D provides a simple interface and has a small size. Written in Java, TreeBuilder3D is a platform-independent, open source application, which may be useful in analysis of large-scale gene expression data.
PMCID: PMC1456991  PMID: 16600045

Results 1-10 (10)