PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (32)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  TDP-1, the Caenorhabditis elegans ortholog of TDP-43, limits the accumulation of double-stranded RNA 
The EMBO Journal  2014;33(24):2947-2966.
Caenorhabditis elegans mutants deleted for TDP-1, an ortholog of the neurodegeneration-associated RNA-binding protein TDP-43, display only mild phenotypes. Nevertheless, transcriptome sequencing revealed that many RNAs were altered in accumulation and/or processing in the mutant. Analysis of these transcriptional abnormalities demonstrates that a primary function of TDP-1 is to limit formation or stability of double-stranded RNA. Specifically, we found that deletion of tdp-1: (1) preferentially alters the accumulation of RNAs with inherent double-stranded structure (dsRNA); (2) increases the accumulation of nuclear dsRNA foci; (3) enhances the frequency of adenosine-to-inosine RNA editing; and (4) dramatically increases the amount of transcripts immunoprecipitable with a dsRNA-specific antibody, including intronic sequences, RNAs with antisense overlap to another transcript, and transposons. We also show that TDP-43 knockdown in human cells results in accumulation of dsRNA, indicating that suppression of dsRNA is a conserved function of TDP-43 in mammals. Altered accumulation of structured RNA may account for some of the previously described molecular phenotypes (e.g., altered splicing) resulting from reduction of TDP-43 function.
doi:10.15252/embj.201488740
PMCID: PMC4282642  PMID: 25391662
neurodegeneration; RNA editing; RNA structure; splicing
2.  TDP-1, the Caenorhabditis elegans ortholog of TDP-43, limits the accumulation of double-stranded RNA 
The EMBO Journal  2014;33(24):2947-2966.
Caenorhabditis elegans mutants deleted for TDP-1, an ortholog of the neurodegeneration-associated RNA-binding protein TDP-43, display only mild phenotypes. Nevertheless, transcriptome sequencing revealed that many RNAs were altered in accumulation and/or processing in the mutant. Analysis of these transcriptional abnormalities demonstrates that a primary function of TDP-1 is to limit formation or stability of double-stranded RNA. Specifically, we found that deletion of tdp-1: (1) preferentially alters the accumulation of RNAs with inherent double-stranded structure (dsRNA); (2) increases the accumulation of nuclear dsRNA foci; (3) enhances the frequency of adenosine-to-inosine RNA editing; and (4) dramatically increases the amount of transcripts immunoprecipitable with a dsRNA-specific antibody, including intronic sequences, RNAs with antisense overlap to another transcript, and transposons. We also show that TDP-43 knockdown in human cells results in accumulation of dsRNA, indicating that suppression of dsRNA is a conserved function of TDP-43 in mammals. Altered accumulation of structured RNA may account for some of the previously described molecular phenotypes (e.g., altered splicing) resulting from reduction of TDP-43 function.
doi:10.15252/embj.201488740
PMCID: PMC4282642  PMID: 25391662
neurodegeneration; RNA editing; RNA structure; splicing
3.  Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants 
A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer (CRC) in a genome-wide association study; this finding has been replicated in case–control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of CRC, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1,030 patients with CRC (cases) and 1,061 individuals without CRC (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (p = 0.014) and levels of COLCA1 in the lamina propria (p = 0.00016) and COLCA2 (tumor cells, p = 0.0041 and lamina propria, p = 6 × 10–5). In conclusion, genetic, expression and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways.
doi:10.1002/ijc.28557
PMCID: PMC3949167  PMID: 24154973
genome-wide association study; genetic risk factors; colon cancer; tumor microenvironment
4.  Redefining Genomic Privacy: Trust and Empowerment 
PLoS Biology  2014;12(11):e1001983.
Current models of protecting human subjects create a zero-sum game of privacy versus data utility. We propose shifting the paradigm to techniques that facilitate trust between researchers and participants.
Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game. Instead, we propose the use of trust-enabling techniques to create a solution in which researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.
doi:10.1371/journal.pbio.1001983
PMCID: PMC4219652  PMID: 25369215
5.  Using GBrowse 2.0 to visualize and share next-generation sequence data 
Briefings in Bioinformatics  2013;14(2):162-171.
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2.0, GBrowse supports next-generation sequencing (NGS) data by providing for the direct display of SAM and BAM sequence alignment files. SAM/BAM tracks provide semantic zooming and support both local and remote data sources. This article provides step-by-step instructions for configuring GBrowse to display NGS data.
doi:10.1093/bib/bbt001
PMCID: PMC3603216  PMID: 23376193
bioinformatics; genomics; DNA sequencing; genome browser; data visualization; data sharing
6.  Computational approaches to identify functional genetic variants in cancer genomes 
Nature methods  2013;10(8):723-729.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
doi:10.1038/nmeth.2562
PMCID: PMC3919555  PMID: 23900255
7.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(Database issue):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
doi:10.1093/nar/gkr921
PMCID: PMC3245176  PMID: 22080565
9.  Cloud-based uniform ChIP-Seq processing tools for modENCODE and ENCODE 
BMC Genomics  2013;14:494.
Background
Funded by the National Institutes of Health (NIH), the aim of the Model Organism ENCyclopedia of DNA Elements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.
Results
In recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.
Conclusions
Using these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.
doi:10.1186/1471-2164-14-494
PMCID: PMC3734164  PMID: 23875683
10.  Exome sequencing identifies nonsegregating nonsense ATM and PALB2 variants in familial pancreatic cancer 
Human Genomics  2013;7(1):11.
We sequenced 11 germline exomes from five families with familial pancreatic cancer (FPC). One proband had a germline nonsense variant in ATM with somatic loss of the variant allele. Another proband had a nonsense variant in PALB2 with somatic loss of the variant allele. Both variants were absent in a relative with FPC. These findings question the causal mechanisms of ATM and PALB2 in these families and highlight challenges in identifying the causes of familial cancer syndromes using exome sequencing.
doi:10.1186/1479-7364-7-11
PMCID: PMC3639869  PMID: 23561644
Hereditary cancer; Pancreas cancer; Germline variants; Genetic counseling; Carcinogenesis
11.  Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process 
Background
Most eukaryotic genes are interrupted by spliceosomal introns. The evolution of exon-intron structure remains mysterious despite rapid advance in genome sequencing technique. In this work, a novel approach is taken based on the assumptions that the evolution of exon-intron structure is a stochastic process, and that the characteristics of this process can be understood by examining its historical outcome, the present-day size distribution of internal translated exons (exon). Through the combination of simulation and modeling the size distribution of exons in different species, we propose a general random fragmentation process (GRFP) to characterize the evolution dynamics of exon-intron structure. This model accurately predicts the probability that an exon will be split by a new intron and the distribution of novel insertions along the length of the exon.
Results
As the first observation from this model, we show that the chance for an exon to obtain an intron is proportional to its size to the 3rd power. We also show that such size dependence is nearly constant across gene, with the exception of the exons adjacent to the 5′ UTR. As the second conclusion from the model, we show that intron insertion loci follow a normal distribution with a mean of 0.5 (center of the exon) and a standard deviation of 0.11. Finally, we show that intron insertions within a gene are independent of each other for vertebrates, but are more negatively correlated for non-vertebrate. We use simulation to demonstrate that the negative correlation might result from significant intron loss during evolution, which could be explained by selection against multi-intron genes in these organisms.
Conclusions
The GRFP model suggests that intron gain is dynamic with a higher chance for longer exons; introns are inserted into exons randomly with the highest probability at the center of the exon. GRFP estimates that there are 78 introns in every 10 kb coding sequences for vertebrate genomes, agreeing with empirical observations. GRFP also estimates that there are significant intron losses in the evolution of non-vertebrate genomes, with extreme cases of around 57% intron loss in Drosophila melanogaster, 28% in Caenorhabditis elegans, and 24% in Oryza sativa.
doi:10.1186/1471-2148-13-57
PMCID: PMC3732091  PMID: 23448166
Evolution of exon-intron structure; General random fragmentation process; Simulation
12.  Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes 
Biankin, Andrew V. | Waddell, Nicola | Kassahn, Karin S. | Gingras, Marie-Claude | Muthuswamy, Lakshmi B. | Johns, Amber L. | Miller, David K. | Wilson, Peter J. | Patch, Ann-Marie | Wu, Jianmin | Chang, David K. | Cowley, Mark J. | Gardiner, Brooke B. | Song, Sarah | Harliwong, Ivon | Idrisoglu, Senel | Nourse, Craig | Nourbakhsh, Ehsan | Manning, Suzanne | Wani, Shivangi | Gongora, Milena | Pajic, Marina | Scarlett, Christopher J. | Gill, Anthony J. | Pinho, Andreia V. | Rooman, Ilse | Anderson, Matthew | Holmes, Oliver | Leonard, Conrad | Taylor, Darrin | Wood, Scott | Xu, Qinying | Nones, Katia | Fink, J. Lynn | Christ, Angelika | Bruxner, Tim | Cloonan, Nicole | Kolle, Gabriel | Newell, Felicity | Pinese, Mark | Mead, R. Scott | Humphris, Jeremy L. | Kaplan, Warren | Jones, Marc D. | Colvin, Emily K. | Nagrial, Adnan M. | Humphrey, Emily S. | Chou, Angela | Chin, Venessa T. | Chantrill, Lorraine A. | Mawson, Amanda | Samra, Jaswinder S. | Kench, James G. | Lovell, Jessica A. | Daly, Roger J. | Merrett, Neil D. | Toon, Christopher | Epari, Krishna | Nguyen, Nam Q. | Barbour, Andrew | Zeps, Nikolajs | Kakkar, Nipun | Zhao, Fengmei | Wu, Yuan Qing | Wang, Min | Muzny, Donna M. | Fisher, William E. | Brunicardi, F. Charles | Hodges, Sally E. | Reid, Jeffrey G. | Drummond, Jennifer | Chang, Kyle | Han, Yi | Lewis, Lora R. | Dinh, Huyen | Buhay, Christian J. | Beck, Timothy | Timms, Lee | Sam, Michelle | Begley, Kimberly | Brown, Andrew | Pai, Deepa | Panchal, Ami | Buchner, Nicholas | De Borja, Richard | Denroche, Robert E. | Yung, Christina K. | Serra, Stefano | Onetto, Nicole | Mukhopadhyay, Debabrata | Tsao, Ming-Sound | Shaw, Patricia A. | Petersen, Gloria M. | Gallinger, Steven | Hruban, Ralph H. | Maitra, Anirban | Iacobuzio-Donahue, Christine A. | Schulick, Richard D. | Wolfgang, Christopher L. | Morgan, Richard A. | Lawlor, Rita T. | Capelli, Paola | Corbo, Vincenzo | Scardoni, Maria | Tortora, Giampaolo | Tempero, Margaret A. | Mann, Karen M. | Jenkins, Nancy A. | Perez-Mancera, Pedro A. | Adams, David J. | Largaespada, David A. | Wessels, Lodewyk F. A. | Rust, Alistair G. | Stein, Lincoln D. | Tuveson, David A. | Copeland, Neal G. | Musgrove, Elizabeth A. | Scarpa, Aldo | Eshleman, James R. | Hudson, Thomas J. | Sutherland, Robert L. | Wheeler, David A. | Pearson, John V. | McPherson, John D. | Gibbs, Richard A. | Grimmond, Sean M.
Nature  2012;491(7424):399-405.
Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.
doi:10.1038/nature11547
PMCID: PMC3530898  PMID: 23103869
13.  The Reactome BioMart 
Reactome is an open source, expert-authored, manually curated and peer-reviewed database of reactions, pathways and biological processes. We provide an intuitive web-based user interface to pathway knowledge and a suite of data analysis tools. The Reactome BioMart provides biologists and bioinformaticians with a single web interface for performing simple or elaborate queries of the Reactome database, aggregating data from different sources and providing an opportunity to integrate experimental and computational results with information relating to biological pathways.
Database URL: http://www.reactome.org
doi:10.1093/database/bar031
PMCID: PMC3197281  PMID: 22012987
14.  Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE 
Roy, Sushmita | Ernst, Jason | Kharchenko, Peter V. | Kheradpour, Pouya | Negre, Nicolas | Eaton, Matthew L. | Landolin, Jane M. | Bristow, Christopher A. | Ma, Lijia | Lin, Michael F. | Washietl, Stefan | Arshinoff, Bradley I. | Ay, Ferhat | Meyer, Patrick E. | Robine, Nicolas | Washington, Nicole L. | Di Stefano, Luisa | Berezikov, Eugene | Brown, Christopher D. | Candeias, Rogerio | Carlson, Joseph W. | Carr, Adrian | Jungreis, Irwin | Marbach, Daniel | Sealfon, Rachel | Tolstorukov, Michael Y. | Will, Sebastian | Alekseyenko, Artyom A. | Artieri, Carlo | Booth, Benjamin W. | Brooks, Angela N. | Dai, Qi | Davis, Carrie A. | Duff, Michael O. | Feng, Xin | Gorchakov, Andrey A. | Gu, Tingting | Henikoff, Jorja G. | Kapranov, Philipp | Li, Renhua | MacAlpine, Heather K. | Malone, John | Minoda, Aki | Nordman, Jared | Okamura, Katsutomo | Perry, Marc | Powell, Sara K. | Riddle, Nicole C. | Sakai, Akiko | Samsonova, Anastasia | Sandler, Jeremy E. | Schwartz, Yuri B. | Sher, Noa | Spokony, Rebecca | Sturgill, David | van Baren, Marijke | Wan, Kenneth H. | Yang, Li | Yu, Charles | Feingold, Elise | Good, Peter | Guyer, Mark | Lowdon, Rebecca | Ahmad, Kami | Andrews, Justen | Berger, Bonnie | Brenner, Steven E. | Brent, Michael R. | Cherbas, Lucy | Elgin, Sarah C. R. | Gingeras, Thomas R. | Grossman, Robert | Hoskins, Roger A. | Kaufman, Thomas C. | Kent, William | Kuroda, Mitzi I. | Orr-Weaver, Terry | Perrimon, Norbert | Pirrotta, Vincenzo | Posakony, James W. | Ren, Bing | Russell, Steven | Cherbas, Peter | Graveley, Brenton R. | Lewis, Suzanna | Micklem, Gos | Oliver, Brian | Park, Peter J. | Celniker, Susan E. | Henikoff, Steven | Karpen, Gary H. | Lai, Eric C. | MacAlpine, David M. | Stein, Lincoln D. | White, Kevin P. | Kellis, Manolis
Science (New York, N.Y.)  2010;330(6012):1787-1797.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
doi:10.1126/science.1198374
PMCID: PMC3192495  PMID: 21177974
15.  The modENCODE Data Coordination Center: lessons in harvesting comprehensive experimental details 
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org.
Database URL: http://www.modencode.org.
doi:10.1093/database/bar023
PMCID: PMC3170170  PMID: 21856757
16.  The case for cloud computing in genome informatics 
Genome Biology  2010;11(5):207.
With DNA sequencing now getting cheaper more quickly than data storage, the time may have come to use cloud computing for genome informatics.
With DNA sequencing now getting cheaper more quickly than data storage or computation, the time may have come for genome informatics to migrate to the cloud.
doi:10.1186/gb-2010-11-5-207
PMCID: PMC2898083  PMID: 20441614
17.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis 
Voight, Benjamin F | Scott, Laura J | Steinthorsdottir, Valgerdur | Morris, Andrew P | Dina, Christian | Welch, Ryan P | Zeggini, Eleftheria | Huth, Cornelia | Aulchenko, Yurii S | Thorleifsson, Gudmar | McCulloch, Laura J | Ferreira, Teresa | Grallert, Harald | Amin, Najaf | Wu, Guanming | Willer, Cristen J | Raychaudhuri, Soumya | McCarroll, Steve A | Langenberg, Claudia | Hofmann, Oliver M | Dupuis, Josée | Qi, Lu | Segrè, Ayellet V | van Hoek, Mandy | Navarro, Pau | Ardlie, Kristin | Balkau, Beverley | Benediktsson, Rafn | Bennett, Amanda J | Blagieva, Roza | Boerwinkle, Eric | Bonnycastle, Lori L | Boström, Kristina Bengtsson | Bravenboer, Bert | Bumpstead, Suzannah | Burtt, Noisël P | Charpentier, Guillaume | Chines, Peter S | Cornelis, Marilyn | Couper, David J | Crawford, Gabe | Doney, Alex S F | Elliott, Katherine S | Elliott, Amanda L | Erdos, Michael R | Fox, Caroline S | Franklin, Christopher S | Ganser, Martha | Gieger, Christian | Grarup, Niels | Green, Todd | Griffin, Simon | Groves, Christopher J | Guiducci, Candace | Hadjadj, Samy | Hassanali, Neelam | Herder, Christian | Isomaa, Bo | Jackson, Anne U | Johnson, Paul R V | Jørgensen, Torben | Kao, Wen H L | Klopp, Norman | Kong, Augustine | Kraft, Peter | Kuusisto, Johanna | Lauritzen, Torsten | Li, Man | Lieverse, Aloysius | Lindgren, Cecilia M | Lyssenko, Valeriya | Marre, Michel | Meitinger, Thomas | Midthjell, Kristian | Morken, Mario A | Narisu, Narisu | Nilsson, Peter | Owen, Katharine R | Payne, Felicity | Perry, John R B | Petersen, Ann-Kristin | Platou, Carl | Proença, Christine | Prokopenko, Inga | Rathmann, Wolfgang | Rayner, N William | Robertson, Neil R | Rocheleau, Ghislain | Roden, Michael | Sampson, Michael J | Saxena, Richa | Shields, Beverley M | Shrader, Peter | Sigurdsson, Gunnar | Sparsø, Thomas | Strassburger, Klaus | Stringham, Heather M | Sun, Qi | Swift, Amy J | Thorand, Barbara | Tichet, Jean | Tuomi, Tiinamaija | van Dam, Rob M | van Haeften, Timon W | van Herpt, Thijs | van Vliet-Ostaptchouk, Jana V | Walters, G Bragi | Weedon, Michael N | Wijmenga, Cisca | Witteman, Jacqueline | Bergman, Richard N | Cauchi, Stephane | Collins, Francis S | Gloyn, Anna L | Gyllensten, Ulf | Hansen, Torben | Hide, Winston A | Hitman, Graham A | Hofman, Albert | Hunter, David J | Hveem, Kristian | Laakso, Markku | Mohlke, Karen L | Morris, Andrew D | Palmer, Colin N A | Pramstaller, Peter P | Rudan, Igor | Sijbrands, Eric | Stein, Lincoln D | Tuomilehto, Jaakko | Uitterlinden, Andre | Walker, Mark | Wareham, Nicholas J | Watanabe, Richard M | Abecasis, Gonçalo R | Boehm, Bernhard O | Campbell, Harry | Daly, Mark J | Hattersley, Andrew T | Hu, Frank B | Meigs, James B | Pankow, James S | Pedersen, Oluf | Wichmann, H-Erich | Barroso, Inês | Florez, Jose C | Frayling, Timothy M | Groop, Leif | Sladek, Rob | Thorsteinsdottir, Unnur | Wilson, James F | Illig, Thomas | Froguel, Philippe | van Duijn, Cornelia M | Stefansson, Kari | Altshuler, David | Boehnke, Michael | McCarthy, Mark I
Nature genetics  2010;42(7):579-589.
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combinedP < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
doi:10.1038/ng.609
PMCID: PMC3080658  PMID: 20581827
18.  Localizing triplet periodicity in DNA and cDNA sequences 
BMC Bioinformatics  2010;11:550.
Background
The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.
Results
Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.
Conclusions
MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.
doi:10.1186/1471-2105-11-550
PMCID: PMC2992068  PMID: 21059240
19.  Bioinformatics: alive and kicking 
Genome Biology  2008;9(12):114.
Bioinformatics is alive and well in 2008 concludes Lincoln Stein, despite his earlier prediction of its imminent demise.
Bioinformatics has become too central to biology to be left to specialist bioinformaticians. Biologists are all bioinformaticians now.
doi:10.1186/gb-2008-9-12-114
PMCID: PMC2646289  PMID: 19133107
20.  WormBase: a comprehensive resource for nematode research 
Nucleic Acids Research  2009;38(Database issue):D463-D467.
WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base.
doi:10.1093/nar/gkp952
PMCID: PMC2808986  PMID: 19910365
21.  nGASP – the nematode genome annotation assessment project 
BMC Bioinformatics  2008;9:549.
Background
While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase.
Results
The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders.
Conclusion
This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.
doi:10.1186/1471-2105-9-549
PMCID: PMC2651883  PMID: 19099578
22.  GMODWeb: a web framework for the generic model organism database 
Genome Biology  2008;9(6):R102.
GMODWeb is a software framework designed to speed the development of websites for model organism databases.
The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from .
doi:10.1186/gb-2008-9-6-r102
PMCID: PMC2481422  PMID: 18570664
23.  WormBase 2007 
Nucleic Acids Research  2007;36(Database issue):D612-D617.
WormBase (www.wormbase.org) is the major publicly available database of information about Caenorhabditis elegans, an important system for basic biological and biomedical research. Derived from the initial ACeDB database of C. elegans genetic and sequence information, WormBase now includes the genomic, anatomical and functional information about C. elegans, other Caenorhabditis species and other nematodes. As such, it is a crucial resource not only for C. elegans biologists but the larger biomedical and bioinformatics communities. Coverage of core areas of C. elegans biology will allow the biomedical community to make full use of the results of intensive molecular genetic analysis and functional genomic studies of this organism. Improved search and display tools, wider cross-species comparisons and extended ontologies are some of the features that will help scientists extend their research and take advantage of other nematode species genome sequences.
doi:10.1093/nar/gkm975
PMCID: PMC2238927  PMID: 17991679
24.  Gallus GBrowse: a unified genomic database for the chicken 
Nucleic Acids Research  2007;36(Database issue):D719-D723.
Gallus GBrowse (http://birdbase.net/cgi-bin/gbrowse/gallus/) provides online access to genomic and other information about the chicken, Gallus gallus. The information provided by this resource includes predicted genes and Gene Ontology (GO) terms, links to Gallus In Situ Hybridization Analysis (GEISHA), Unigene and Reactome, the genomic positions of chicken genetic markers, SNPs and microarray probes, and mappings from turkey, condor and zebra finch DNA and EST sequences to the chicken genome. We also provide a BLAT server (http://birdbase.net/cgi-bin/webBlat) for matching user-provided sequences to the chicken genome. These tools make the Gallus GBrowse server a valuable resource for researchers seeking genomic information regarding the chicken and other avian species.
doi:10.1093/nar/gkm783
PMCID: PMC2238981  PMID: 17933775
25.  Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics 
Genome Biology  2006;7(12):R126.
Comparative genomic analysis of three nematode species identifies 93 genes that encode putative components of the ciliated neurons in C. elegans and are subject to the same regulatory control.
Background
The recent availability of genome sequences of multiple related Caenorhabditis species has made it possible to identify, using comparative genomics, similarly transcribed genes in Caenorhabditis elegans and its sister species. Taking this approach, we have identified numerous novel ciliary genes in C. elegans, some of which may be orthologs of unidentified human ciliopathy genes.
Results
By screening for genes possessing canonical X-box sequences in promoters of three Caenorhabditis species, namely C. elegans, C. briggsae and C. remanei, we identified 93 genes (including known X-box regulated genes) that encode putative components of ciliated neurons in C. elegans and are subject to the same regulatory control. For many of these genes, restricted anatomical expression in ciliated cells was confirmed, and control of transcription by the ciliogenic DAF-19 RFX transcription factor was demonstrated by comparative transcriptional profiling of different tissue types and of daf-19(+) and daf-19(-) animals. Finally, we demonstrate that the dye-filling defect of dyf-5(mn400) animals, which is indicative of compromised exposure of cilia to the environment, is caused by a nonsense mutation in the serine/threonine protein kinase gene M04C9.5.
Conclusion
Our comparative genomics-based predictions may be useful for identifying genes involved in human ciliopathies, including Bardet-Biedl Syndrome (BBS), since the C. elegans orthologs of known human BBS genes contain X-box motifs and are required for normal dye filling in C. elegans ciliated neurons.
doi:10.1186/gb-2006-7-12-r126
PMCID: PMC1794439  PMID: 17187676

Results 1-25 (32)