PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (97)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  Plant Reactome: a resource for plant pathways and comparative analysis 
Nucleic Acids Research  2016;45(Database issue):D1029-D1039.
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX.
doi:10.1093/nar/gkw932
PMCID: PMC5210633  PMID: 27799469
2.  Insights into cancer severity from biomolecular interaction mechanisms 
Scientific Reports  2016;6:34490.
To attain a deeper understanding of diseases like cancer, it is critical to couple genetics with biomolecular mechanisms. High-throughput sequencing has identified thousands of somatic mutations across dozens of cancers, and there is a pressing need to identify the few that are pathologically relevant. Here we use protein structure and interaction data to interrogate nonsynonymous somatic cancer mutations, identifying a set of 213 molecular interfaces (protein-protein, -small molecule or –nucleic acid) most often perturbed in cancer, highlighting several potentially novel cancer genes. Over half of these interfaces involve protein-small-molecule interactions highlighting their overall importance in cancer. We found distinct differences in the predominance of perturbed interfaces between cancers and histological subtypes and presence or absence of certain interfaces appears to correlate with cancer severity.
doi:10.1038/srep34490
PMCID: PMC5048291  PMID: 27698488
3.  Identification of pre-leukemic hematopoietic stem cells in acute leukemia 
Nature  2014;506(7488):328-333.
Summary
In acute myeloid leukemia (AML), the cell of origin, nature and biological consequences of initiating lesions and order of subsequent mutations remain poorly understood, as AML is typically diagnosed without observation of a pre-leukemic phase. Here, highly purified hematopoietic stem cells (HSC), progenitor and mature cell fractions from the blood of AML patients were found to contain recurrent DNMT3a mutations (DNMT3amut) at high allele frequency, but without coincident NPM1 mutations (NPM1c) present in AML blasts. DNMT3amut-bearing HSC exhibited multilineage repopulation advantage over non-mutated HSC in xenografts, establishing their identity as pre-leukemic-HSC (preL-HSC). preL-HSC were found in remission samples indicating that they survive chemotherapy. Thus DNMT3amut arises early in AML evolution, likely in HSC, leading to a clonally expanded pool of preL-HSC from which AML evolves. Our findings provide a paradigm for the detection and treatment of pre-leukemic clones before the acquisition of additional genetic lesions engenders greater therapeutic resistance.
doi:10.1038/nature13038
PMCID: PMC4991939  PMID: 24522528 CAMSID: cams3940
4.  Cross-Organism Analysis Using InterMine 
Genesis (New York, N.Y. : 2000)  2015;53(8):547-560.
InterMine is a data integration warehouse and analysis software system developed for large and complex biological datasets. Designed for integrative analysis it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features.
The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarised and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities.
InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse and rat together with a newly developed human database. Here we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine based systems described in the paper are resources freely available to the scientific community.
doi:10.1002/dvg.22869
PMCID: PMC4545681  PMID: 26097192
Data integration; Data analysis; Comparative analysis; Cross-organism analysis; Integrative analysis; Genomics; Proteomics
5.  DISTINCT ROUTES OF LINEAGE DEVELOPMENT RESHAPE THE HUMAN BLOOD HIERARCHY ACROSS ONTOGENY 
Science (New York, N.Y.)  2015;351(6269):aab2116.
Classically, blood arises from stem cells through a series of oligopotent progenitors that become increasingly restricted to unipotent progenitors, each slotted into a hierarchical layer based on their differentiation potential. The presence of oligopotent cells is critical to the standard model of blood differentiation as they define the path from stem cells to unipotent progenitors. We developed a new cell-sorting scheme to resolve myeloid (My), erythroid (Er) and megakaryocytic (Mk) fates from single CD34+ cells and then mapped the progenitor hierarchy across human development. Fetal liver contained large numbers of distinct oligopotent progenitors with entangled My, Er and Mk fates. Unexpectedly in adult bone marrow, few oligopotent progenitor intermediates were present with multipotent and unipotent progenitors predominating, and now Er-Mk lineages emerged from multipotent cells. The developmental shift to an adult ‘two-tier’ hierarchy challenges current dogma and provides a new framework to understand normal and disease states of human hematopoiesis.
doi:10.1126/science.aab2116
PMCID: PMC4816201  PMID: 26541609
6.  JBrowse: a dynamic web platform for genome visualization and analysis 
Genome Biology  2016;17:66.
Background
JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page.
Results
Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication.
Conclusions
JBrowse is a mature web application suitable for genome visualization and analysis.
doi:10.1186/s13059-016-0924-1
PMCID: PMC4830012  PMID: 27072794
Genome; Browser; Bioinformatics
7.  Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing 
Nucleic Acids Research  2016;44(11):e105.
Detection of cell-free DNA in liquid biopsies offers great potential for use in non-invasive prenatal testing and as a cancer biomarker. Fetal and tumor DNA fractions however can be extremely low in these samples and ultra-sensitive methods are required for their detection. Here, we report an extremely simple and fast method for introduction of barcodes into DNA libraries made from 5 ng of DNA. Barcoded adapter primers are designed with an oligonucleotide hairpin structure to protect the molecular barcodes during the first rounds of polymerase chain reaction (PCR) and prevent them from participating in mis-priming events. Our approach enables high-level multiplexing and next-generation sequencing library construction with flexible library content. We show that uniform libraries of 1-, 5-, 13- and 31-plex can be generated. Utilizing the barcodes to generate consensus reads for each original DNA molecule reduces background sequencing noise and allows detection of variant alleles below 0.1% frequency in clonal cell line DNA and in cell-free plasma DNA. Thus, our approach bridges the gap between the highly sensitive but specific capabilities of digital PCR, which only allows a limited number of variants to be analyzed, with the broad target capability of next-generation sequencing which traditionally lacks the sensitivity to detect rare variants.
doi:10.1093/nar/gkw224
PMCID: PMC4914102  PMID: 27060140
8.  Downregulation of histone H2A and H2B pathways is associated with anthracycline sensitivity in breast cancer 
Background
Drug resistance in breast cancer is the major obstacle to effective treatment with chemotherapy. While upregulation of multidrug resistance genes is an important component of drug resistance mechanisms in vitro, their clinical relevance remains to be determined. Therefore, identifying pathways that could be targeted in the clinic to eliminate anthracycline-resistant breast cancer remains a major challenge.
Methods
We generated paired native and epirubicin-resistant MDA-MB-231, MCF7, SKBR3 and ZR-75-1 epirubicin-resistant breast cancer cell lines to identify pathways contributing to anthracycline resistance. Native cell lines were exposed to increasing concentrations of epirubicin until resistant cells were generated. To identify mechanisms driving epirubicin resistance, we used a complementary approach including gene expression analyses to identify molecular pathways involved in resistance, and small-molecule inhibitors to reverse resistance. In addition, we tested its clinical relevance in a BR9601 adjuvant clinical trial.
Results
Characterisation of epirubicin-resistant cells revealed that they were cross-resistant to doxorubicin and SN-38 and had alterations in apoptosis and cell-cycle profiles. Gene expression analysis identified deregulation of histone H2A and H2B genes in all four cell lines. Histone deacetylase small-molecule inhibitors reversed resistance and were cytotoxic for epirubicin-resistant cell lines, confirming that histone pathways are associated with epirubicin resistance. Gene expression of a novel 18-gene histone pathway module analysis of the BR9601 adjuvant clinical trial revealed that patients with low expression of the 18-gene histone module benefited from anthracycline treatment more than those with high expression (hazard ratio 0.35, 95 % confidence interval 0.13–0.96, p = 0.042).
Conclusions
This study revealed a key pathway that contributes to anthracycline resistance and established model systems for investigating drug resistance in all four major breast cancer subtypes. As the histone modification can be targeted with small-molecule inhibitors, it represents a possible means of reversing clinical anthracycline resistance.
Trial registration
ClinicalTrials.gov identifier NCT00003012. Registered on 1 November 1999.
Electronic supplementary material
The online version of this article (doi:10.1186/s13058-016-0676-6) contains supplementary material, which is available to authorized users.
doi:10.1186/s13058-016-0676-6
PMCID: PMC4744406  PMID: 26852132
Breast cancer; Anthracycline resistance; Gene expression; Small-molecule inhibitors; Clinical trial; Histone
9.  Pathway and Network Analysis of Cancer Genomes 
Nature methods  2015;12(7):615-621.
Genomic information on tumors from 50 cancer types catalogued by The International Cancer Genome Consortium (ICGC) shows that only few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
doi:10.1038/nmeth.3440
PMCID: PMC4717906  PMID: 26125594
10.  A cancer cell-line titration series for evaluating somatic classification 
BMC Research Notes  2015;8:823.
Background
Accurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies.
Results
Cell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease.
Conclusions
Our cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium’s Data Access Compliance Office (ICGC DACO).
Electronic supplementary material
The online version of this article (doi:10.1186/s13104-015-1803-7) contains supplementary material, which is available to authorized users.
doi:10.1186/s13104-015-1803-7
PMCID: PMC4691534  PMID: 26708082
Whole exome sequencing dataset; Somatic mutation calling; Cancer bioinformatics; Tumour cellularity; Normal contamination
11.  TDP-1, the Caenorhabditis elegans ortholog of TDP-43, limits the accumulation of double-stranded RNA 
The EMBO Journal  2014;33(24):2947-2966.
Caenorhabditis elegans mutants deleted for TDP-1, an ortholog of the neurodegeneration-associated RNA-binding protein TDP-43, display only mild phenotypes. Nevertheless, transcriptome sequencing revealed that many RNAs were altered in accumulation and/or processing in the mutant. Analysis of these transcriptional abnormalities demonstrates that a primary function of TDP-1 is to limit formation or stability of double-stranded RNA. Specifically, we found that deletion of tdp-1: (1) preferentially alters the accumulation of RNAs with inherent double-stranded structure (dsRNA); (2) increases the accumulation of nuclear dsRNA foci; (3) enhances the frequency of adenosine-to-inosine RNA editing; and (4) dramatically increases the amount of transcripts immunoprecipitable with a dsRNA-specific antibody, including intronic sequences, RNAs with antisense overlap to another transcript, and transposons. We also show that TDP-43 knockdown in human cells results in accumulation of dsRNA, indicating that suppression of dsRNA is a conserved function of TDP-43 in mammals. Altered accumulation of structured RNA may account for some of the previously described molecular phenotypes (e.g., altered splicing) resulting from reduction of TDP-43 function.
doi:10.15252/embj.201488740
PMCID: PMC4282642  PMID: 25391662
neurodegeneration; RNA editing; RNA structure; splicing
12.  The Reactome pathway Knowledgebase 
Nucleic Acids Research  2015;44(Database issue):D481-D487.
The Reactome Knowledgebase (www.reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression pattern surveys or somatic mutation catalogues from tumour cells. Over the last two years we redeveloped major components of the Reactome web interface to improve usability, responsiveness and data visualization. A new pathway diagram viewer provides a faster, clearer interface and smooth zooming from the entire reaction network to the details of individual reactions. Tool performance for analysis of user datasets has been substantially improved, now generating detailed results for genome-wide expression datasets within seconds. The analysis module can now be accessed through a RESTFul interface, facilitating its inclusion in third party applications. A new overview module allows the visualization of analysis results on a genome-wide Reactome pathway hierarchy using a single screen page. The search interface now provides auto-completion as well as a faceted search to narrow result lists efficiently.
doi:10.1093/nar/gkv1351
PMCID: PMC4702931  PMID: 26656494
13.  Surveillance in Patients With Barrett's Esophagus for Early Detection of Esophageal Adenocarcinoma: A Systematic Review and Meta-Analysis 
Objectives:
Although endoscopic surveillance of patients with Barrett's esophagus (BE) has been widely implemented for early detection of esophageal adenocarcinoma (EAC), its justification has been debated. This systematic review aimed to evaluate benefits, safety, and cost effectiveness of surveillance for patients with BE.
Methods:
MEDLINE, EMBASE, EconLit, Scopus, Cochrane, and CINAHL were searched for published human studies that examined screening practices, benefits, safety, and cost effectiveness of surveillance among patients with BE. Reviewers independently reviewed eligible full-text study articles and conducted data extraction and quality assessment, with disagreements resolved by consensus. Random effects meta-analyses were performed to assess the incidence of EAC, EAC/high-grade dysplasia (HGD), and annual stage-specific transition probabilities detected among BE patients under surveillance, and relative risk of mortality among EAC patients detected during surveillance compared with those not under surveillance.
Results:
A total of 51 studies with 11,028 subjects were eligible; the majority were of high quality based on the Newcastle–Ottawa quality scale. Among BE patients undergoing endoscopic surveillance, pooled EAC incidence per 1,000 person-years of surveillance follow-up was 5.5 (95% confidence interval (CI): 4.2–6.8) and pooled EAC/HGD incidence was 7.7 (95% CI: 5.7–9.7). Pooled relative mortality risk among surveillance-detected EAC patients compared with nonsurveillance-detected EAC patients was 0.386 (95% CI: 0.242–0.617). Pooled annual stage-specific transition probabilities from nondysplastic BE to low-grade dysplasia, high-grade dysplasia, and EAC were 0.019, 0.003, and 0.004, respectively. There was, however, insufficient scientific evidence on safety and cost effectiveness of surveillance for BE patients.
Conclusions:
Our findings confirmed a low incidence rate of EAC among BE patients undergoing surveillance and a reduction in mortality by 61% among those who received regular surveillance and developed EAC. Because of knowledge gaps, it is important to assess safety of surveillance and health-care resource use and costs to supplement existing evidence and inform a future policy decision for surveillance programs.
doi:10.1038/ctg.2015.58
PMCID: PMC4816094  PMID: 26658838
14.  WormBase 2016: expanding to enable helminth genomic research 
Nucleic Acids Research  2015;44(Database issue):D774-D780.
WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research.
doi:10.1093/nar/gkv1217
PMCID: PMC4702863  PMID: 26578572
15.  Gramene 2016: comparative plant genomics and pathway resources 
Nucleic Acids Research  2015;44(Database issue):D1133-D1140.
Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
doi:10.1093/nar/gkv1179
PMCID: PMC4702844  PMID: 26553803
16.  Reactome Pathway Analysis to Enrich Biological Discovery in Proteomics Datasets 
Proteomics  2011;11(18):3598-3613.
Reactome (http://www.reactome.org) is an open source, expert-authored, peer-reviewed, manually curated database of reactions, pathways and biological processes. We provide an intuitive web-based user interface to pathway knowledge and a suite of data analysis tools. The Pathway Browser is a Systems Biology Graphical Notation (SBGN)-like visualization system that supports manual navigation of pathways by zooming, scrolling and event highlighting, and that exploits PSI Common Query Interface (PSIQUIC) web services to overlay pathways with molecular interaction data from the Reactome Functional Interaction (FI) Network and interaction databases such as IntAct, ChEMBL, and BioGRID. Pathway and Expression Analysis tools employ web services to provide ID mapping, pathway assignment and over-representation analysis of user-supplied datasets. By applying Ensembl Compara to curated human proteins and reactions, Reactome generates pathway inferences for 20 other species. The Species Comparison tool provides a summary of results for each of these species as a table showing numbers of orthologous proteins found by pathway from which users can navigate to inferred details for specific proteins and reactions. Reactome’s diverse pathway knowledge and suite of data analysis tools provide a platform for data mining, modeling and the analysis of large-scale proteomics datasets.
doi:10.1002/pmic.201100066
PMCID: PMC4617659  PMID: 21751369
Pathway database; Pathway visualization; Pathway analysis; BioMart; Data integration
17.  RNASequel: accurate and repeat tolerant realignment of RNA-seq reads 
Nucleic Acids Research  2015;43(18):e122.
RNA-seq is a key technology for understanding the biology of the cell because of its ability to profile transcriptional and post-transcriptional regulation at single nucleotide resolutions. Compared to DNA sequencing alignment algorithms, RNA-seq alignment algorithms have a diminished ability to accurately detect and map base pair substitutions, gaps, discordant pairs and repetitive regions. These shortcomings adversely affect experiments that require a high degree of accuracy, notably the ability to detect RNA editing. We have developed RNASequel, a software package that runs as a post-processing step in conjunction with an RNA-seq aligner and systematically corrects common alignment artifacts. Its key innovations are a two-pass splice junction alignment system that includes de novo splice junctions and the use of an empirically determined estimate of the fragment size distribution when resolving read pairs. We demonstrate that RNASequel produces improved alignments when used in conjunction with STAR or Tophat2 using two simulated datasets. We then show that RNASequel improves the identification of adenosine to inosine RNA editing sites on biological datasets. This software will be useful in applications requiring the accurate identification of variants in RNA sequencing data, the discovery of RNA editing sites and the analysis of alternative splicing.
doi:10.1093/nar/gkv594
PMCID: PMC4605292  PMID: 26082497
18.  Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants 
A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer in a genome-wide association study; this finding has been replicated in case-control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of colorectal cancer, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1030 patients with colorectal cancer (cases) and 1061 individuals without colorectal cancer (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells, and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune, and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (P=.014) and levels of COLCA1 in the lamina propria (P=.00016) and COLCA2 (tumor cells, P=.0041 and lamina propria, P = 6×10−5). In conclusion, genetic, expression, and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways.
doi:10.1002/ijc.28557
PMCID: PMC3949167  PMID: 24154973
Genome-wide association study; genetic risk factors; colon cancer; tumor microenvironment
19.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors 
Genome Biology  2015;16(1):35.
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0602-8) contains supplementary material, which is available to authorized users.
doi:10.1186/s13059-015-0602-8
PMCID: PMC4359439  PMID: 25786235
20.  TDP-1, the Caenorhabditis elegans ortholog of TDP-43, limits the accumulation of double-stranded RNA 
The EMBO Journal  2014;33(24):2947-2966.
Caenorhabditis elegans mutants deleted for TDP-1, an ortholog of the neurodegeneration-associated RNA-binding protein TDP-43, display only mild phenotypes. Nevertheless, transcriptome sequencing revealed that many RNAs were altered in accumulation and/or processing in the mutant. Analysis of these transcriptional abnormalities demonstrates that a primary function of TDP-1 is to limit formation or stability of double-stranded RNA. Specifically, we found that deletion of tdp-1: (1) preferentially alters the accumulation of RNAs with inherent double-stranded structure (dsRNA); (2) increases the accumulation of nuclear dsRNA foci; (3) enhances the frequency of adenosine-to-inosine RNA editing; and (4) dramatically increases the amount of transcripts immunoprecipitable with a dsRNA-specific antibody, including intronic sequences, RNAs with antisense overlap to another transcript, and transposons. We also show that TDP-43 knockdown in human cells results in accumulation of dsRNA, indicating that suppression of dsRNA is a conserved function of TDP-43 in mammals. Altered accumulation of structured RNA may account for some of the previously described molecular phenotypes (e.g., altered splicing) resulting from reduction of TDP-43 function.
doi:10.15252/embj.201488740
PMCID: PMC4282642  PMID: 25391662
neurodegeneration; RNA editing; RNA structure; splicing
21.  Identification of genes expressed by immune cells of the colon that are regulated by colorectal cancer-associated variants 
A locus on human chromosome 11q23 tagged by marker rs3802842 was associated with colorectal cancer (CRC) in a genome-wide association study; this finding has been replicated in case–control studies worldwide. In order to identify biologic factors at this locus that are related to the etiopathology of CRC, we used microarray-based target selection methods, coupled to next-generation sequencing, to study 103 kb at the 11q23 locus. We genotyped 369 putative variants from 1,030 patients with CRC (cases) and 1,061 individuals without CRC (controls) from the Ontario Familial Colorectal Cancer Registry. Two previously uncharacterized genes, COLCA1 and COLCA2, were found to be co-regulated genes that are transcribed from opposite strands. Expression levels of COLCA1 and COLCA2 transcripts correlate with rs3802842 genotypes. In colon tissues, COLCA1 co-localizes with crystalloid granules of eosinophils and granular organelles of mast cells, neutrophils, macrophages, dendritic cells and differentiated myeloid-derived cell lines. COLCA2 is present in the cytoplasm of normal epithelial, immune and other cell lineages, as well as tumor cells. Tissue microarray analysis demonstrates the association of rs3802842 with lymphocyte density in the lamina propria (p = 0.014) and levels of COLCA1 in the lamina propria (p = 0.00016) and COLCA2 (tumor cells, p = 0.0041 and lamina propria, p = 6 × 10–5). In conclusion, genetic, expression and immunohistochemical data implicate COLCA1 and COLCA2 in the pathogenesis of colon cancer. Histologic analyses indicate the involvement of immune pathways.
doi:10.1002/ijc.28557
PMCID: PMC3949167  PMID: 24154973
genome-wide association study; genetic risk factors; colon cancer; tumor microenvironment
22.  Redefining Genomic Privacy: Trust and Empowerment 
PLoS Biology  2014;12(11):e1001983.
Current models of protecting human subjects create a zero-sum game of privacy versus data utility. We propose shifting the paradigm to techniques that facilitate trust between researchers and participants.
Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game. Instead, we propose the use of trust-enabling techniques to create a solution in which researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.
doi:10.1371/journal.pbio.1001983
PMCID: PMC4219652  PMID: 25369215
23.  ReactomeFIViz: a Cytoscape app for pathway and network-based data analysis 
F1000Research  2014;3:146.
High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
doi:10.12688/f1000research.4431.2
PMCID: PMC4184317  PMID: 25309732
24.  Computational approaches to identify functional genetic variants in cancer genomes 
Nature methods  2013;10(8):723-729.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
doi:10.1038/nmeth.2562
PMCID: PMC3919555  PMID: 23900255
25.  ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis 
F1000Research  2014;3:146.
High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
doi:10.12688/f1000research.4431.1
PMCID: PMC4184317  PMID: 25309732

Results 1-25 (97)