PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-15 (15)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Single Cell Genomics meeting in Stockholm: from single cells to cell types 
Genome Biology  2014;15(10):496.
A report on the second Single Cell Genomics conference held in Stockholm, Sweden, September 9–11, 2014.
doi:10.1186/s13059-014-0496-x
PMCID: PMC4281946  PMID: 25418892
2.  Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments 
Nucleic Acids Research  2013;42(Database issue):D926-D932.
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
doi:10.1093/nar/gkt1270
PMCID: PMC3964963  PMID: 24304889
3.  Cooperativity and Rapid Evolution of Cobound Transcription Factors in Closely Related Mammals 
Cell  2013;154(3):530-540.
Summary
To mechanistically characterize the microevolutionary processes active in altering transcription factor (TF) binding among closely related mammals, we compared the genome-wide binding of three tissue-specific TFs that control liver gene expression in six rodents. Despite an overall fast turnover of TF binding locations between species, we identified thousands of TF regions of highly constrained TF binding intensity. Although individual mutations in bound sequence motifs can influence TF binding, most binding differences occur in the absence of nearby sequence variations. Instead, combinatorial binding was found to be significant for genetic and evolutionary stability; cobound TFs tend to disappear in concert and were sensitive to genetic knockout of partner TFs. The large, qualitative differences in genomic regions bound between closely related mammals, when contrasted with the smaller, quantitative TF binding differences among Drosophila species, illustrate how genome structure and population genetics together shape regulatory evolution.
Graphical Abstract
Highlights
•Earliest steps of regulatory evolution in mammals captured using five mouse species•Interspecies differences in TF binding are rarely caused by DNA variation in motifs•Cobound TFs change their genomic binding cooperatively in closely related mammals•Genetic knockouts revealed the extent of cooperative stabilization in TF binding clusters
Microevolutionary mechanisms create different transcription factor binding patterns between mammals, shedding light on the regulatory mechanisms partially underlying speciation.
doi:10.1016/j.cell.2013.07.007
PMCID: PMC3732390  PMID: 23911320
4.  bioWeb3D: an online webGL 3D data visualisation tool 
BMC Bioinformatics  2013;14:185.
Background
Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users.
Results
An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities.
Conclusions
Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets.
doi:10.1186/1471-2105-14-185
PMCID: PMC3710502  PMID: 23758781
5.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data 
Genome Biology  2013;14(1):R7.
Background
Genetically identical populations of cells grown in the same environmental condition show substantial variability in gene expression profiles. Although single-cell RNA-seq provides an opportunity to explore this phenomenon, statistical methods need to be developed to interpret the variability of gene expression counts.
Results
We develop a statistical framework for studying the kinetics of stochastic gene expression from single-cell RNA-seq data. By applying our model to a single-cell RNA-seq dataset generated by profiling mouse embryonic stem cells, we find that the inferred kinetic parameters are consistent with RNA polymerase II binding and chromatin modifications. Our results suggest that histone modifications affect transcriptional bursting by modulating both burst size and frequency. Furthermore, we show that our model can be used to identify genes with slow promoter kinetics, which are important for probabilistic differentiation of embryonic stem cells.
Conclusions
We conclude that the proposed statistical model provides a flexible and efficient way to investigate the kinetics of transcription.
doi:10.1186/gb-2013-14-1-r7
PMCID: PMC3663116  PMID: 23360624
gene regulation; RNA-seq; single-cell; statistics; transcriptional burst
6.  Genomic-scale capture and sequencing of endogenous DNA from feces 
Molecular ecology  2010;19(24):5332-5344.
Genomic-level analyses of DNA from non-invasive sources would facilitate powerful conservation and evolutionary studies in natural populations of endangered and otherwise elusive species. However, the typical low quantity and poor quality of DNA that is extracted from non-invasive samples have generally precluded such work. Here we apply a modified DNA capture protocol that, when used in combination with massively-parallel sequencing technology, facilitates efficient and highly-accurate resequencing of megabases of specified nuclear genomic regions from fecal DNA samples. We validated our approach by comparing genetic variants identified from corresponding fecal and blood DNA samples of six western chimpanzees (Pan troglodytes verus) across more than 1.5 megabases of chromosome 21, chromosome X, and the complete mitochondrial genome. Our results suggest that it is now feasible to conduct genomic studies in natural populations for which constraints on invasive sampling have otherwise long been a barrier. The data we collected also provided an opportunity to examine western chimpanzee genetic diversity at unprecedented scale. Despite high mitochondrial genome diversity (π = 0.585%), western chimpanzees have a low ratio (0.42) of X chromosomal (π = 0.034%) to autosomal (chromosome 21 π = 0.081%) sequence diversity, a pattern that may reflect an unusual demographic history of this subspecies.
doi:10.1111/j.1365-294X.2010.04888.x
PMCID: PMC2998560  PMID: 21054605
molecular ecology; population genetics; non-invasive sampling; conservation genomics
7.  Understanding mechanisms underlying human gene expression variation with RNA sequencing 
Nature  2010;464(7289):768-772.
Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project2. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.
doi:10.1038/nature08872
PMCID: PMC3089435  PMID: 20220758
8.  A Genome-Wide Study of DNA Methylation Patterns and Gene Expression Levels in Multiple Human and Chimpanzee Tissues 
PLoS Genetics  2011;7(2):e1001316.
The modification of DNA by methylation is an important epigenetic mechanism that affects the spatial and temporal regulation of gene expression. Methylation patterns have been described in many contexts within and across a range of species. However, the extent to which changes in methylation might underlie inter-species differences in gene regulation, in particular between humans and other primates, has not yet been studied. To this end, we studied DNA methylation patterns in livers, hearts, and kidneys from multiple humans and chimpanzees, using tissue samples for which genome-wide gene expression data were also available. Using the multi-species gene expression and methylation data for 7,723 genes, we were able to study the role of promoter DNA methylation in the evolution of gene regulation across tissues and species. We found that inter-tissue methylation patterns are often conserved between humans and chimpanzees. However, we also found a large number of gene expression differences between species that might be explained, at least in part, by corresponding differences in methylation levels. In particular, we estimate that, in the tissues we studied, inter-species differences in promoter methylation might underlie as much as 12%–18% of differences in gene expression levels between humans and chimpanzees.
Author Summary
It has long been hypothesized that changes in gene regulation have played an important role in primate evolution. However, despite the wealth of comparative gene expression data, there are still only few studies that focus on the mechanisms underlying inter-primate differences in gene regulation. In particular, we know relatively little about the degree to which changes in epigenetic profiles might explain differences in gene expression levels between primates. To this end, we studied DNA methylation and gene expression levels in livers, hearts, and kidneys from multiple humans and chimpanzees. Using these comparative data, we were able to study the evolution of gene regulation in the context of conservation of or changes in DNA methylation profiles across tissues and species. We found that inter-tissue methylation patterns are often conserved between humans and chimpanzees. In addition, we also found a large number of gene expression differences between species, which might be explained, at least in part, by corresponding differences in methylation levels. We estimate that, in the tissues we studied, inter-species differences in methylation levels might underlie as much as 12%–18% of differences in gene expression levels between humans and chimpanzees.
doi:10.1371/journal.pgen.1001316
PMCID: PMC3044686  PMID: 21383968
9.  Functional Comparison of Innate Immune Signaling Pathways in Primates 
PLoS Genetics  2010;6(12):e1001249.
Humans respond differently than other primates to a large number of infections. Differences in susceptibility to infectious agents between humans and other primates are probably due to inter-species differences in immune response to infection. Consistent with that notion, genes involved in immunity-related processes are strongly enriched among recent targets of positive selection in primates, suggesting that immune responses evolve rapidly, yet providing only indirect evidence for possible inter-species functional differences. To directly compare immune responses among primates, we stimulated primary monocytes from humans, chimpanzees, and rhesus macaques with lipopolysaccharide (LPS) and studied the ensuing time-course regulatory responses. We find that, while the universal Toll-like receptor response is mostly conserved across primates, the regulatory response associated with viral infections is often lineage-specific, probably reflecting rapid host–virus mutual adaptation cycles. Additionally, human-specific immune responses are enriched for genes involved in apoptosis, as well as for genes associated with cancer and with susceptibility to infectious diseases or immune-related disorders. Finally, we find that chimpanzee-specific immune signaling pathways are enriched for HIV–interacting genes. Put together, our observations lend strong support to the notion that lineage-specific immune responses may help explain known inter-species differences in susceptibility to infectious diseases.
Author Summary
We know of a large number of diseases or medical conditions that affect humans more severely than non-human primates, such as AIDS, malaria, hepatitis B, and cancer. These differences likely arise from different immune responses to infection among species. However, due to the lack of comparative functional data across species, it remains unclear how the immune system of humans and other primates differ. In this work, we present the first genome-wide characterization of functional differences in innate immune responses between humans and our closest evolutionary relatives. Our results indicate that “core” immune responses, those that are critical to fight any invading pathogen, are the most conserved across primates and that much of the divergence in immune responses is observed in genes that are involved in response to specific microbial and viral agents. In addition, we show that human-specific immune responses are enriched for genes involved in apoptosis and cancer biology, as well as with genes previously associated with susceptibility to infectious diseases or immune-related disorders. Finally, we find that chimpanzee-specific immune signaling pathways are enriched for HIV–interacting genes. Our observations may therefore help explain known inter-species differences in susceptibility to infectious diseases.
doi:10.1371/journal.pgen.1001249
PMCID: PMC3002988  PMID: 21187902
11.  The pitfalls of platform comparison: DNA copy number array technologies assessed 
BMC Genomics  2009;10:588.
Background
The accurate and high resolution mapping of DNA copy number aberrations has become an important tool by which to gain insight into the mechanisms of tumourigenesis. There are various commercially available platforms for such studies, but there remains no general consensus as to the optimal platform. There have been several previous platform comparison studies, but they have either described older technologies, used less-complex samples, or have not addressed the issue of the inherent biases in such comparisons. Here we describe a systematic comparison of data from four leading microarray technologies (the Affymetrix Genome-wide SNP 5.0 array, Agilent High-Density CGH Human 244A array, Illumina HumanCNV370-Duo DNA Analysis BeadChip, and the Nimblegen 385 K oligonucleotide array). We compare samples derived from primary breast tumours and their corresponding matched normals, well-established cancer cell lines, and HapMap individuals. By careful consideration and avoidance of potential sources of bias, we aim to provide a fair assessment of platform performance.
Results
By performing a theoretical assessment of the reproducibility, noise, and sensitivity of each platform, notable differences were revealed. Nimblegen exhibited between-replicate array variances an order of magnitude greater than the other three platforms, with Agilent slightly outperforming the others, and a comparison of self-self hybridizations revealed similar patterns. An assessment of the single probe power revealed that Agilent exhibits the highest sensitivity. Additionally, we performed an in-depth visual assessment of the ability of each platform to detect aberrations of varying sizes. As expected, all platforms were able to identify large aberrations in a robust manner. However, some focal amplifications and deletions were only detected in a subset of the platforms.
Conclusion
Although there are substantial differences in the design, density, and number of replicate probes, the comparison indicates a generally high level of concordance between platforms, despite differences in the reproducibility, noise, and sensitivity. In general, Agilent tended to be the best aCGH platform and Affymetrix, the superior SNP-CGH platform, but for specific decisions the results described herein provide a guide for platform selection and study design, and the dataset a resource for more tailored comparisons.
doi:10.1186/1471-2164-10-588
PMCID: PMC2797821  PMID: 19995423
12.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data 
Bioinformatics  2009;25(24):3207-3212.
Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).
Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data.
Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156.
Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp579
PMCID: PMC2788925  PMID: 19808877
13.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
doi:10.1038/nbt1414
PMCID: PMC2644410  PMID: 18612301
14.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization 
Genome Biology  2007;8(10):R228.
Datasets used for detecting copy number variation (CNV) are shown to be affected by a technical artifact. A novel CNV calling algorithm is presented which removes this artifact and identifies regions of CNV better than existing methods.
Background
Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.
Results
We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.
Conclusion
Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
doi:10.1186/gb-2007-8-10-r228
PMCID: PMC2246302  PMID: 17961237
15.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer 
Genome Biology  2007;8(10):R215.
High resolution array-CGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer, and provides a genome-wide list of common copy number alterations associated with aberrant expression and poor prognosis.
Background
The characterization of copy number alteration patterns in breast cancer requires high-resolution genome-wide profiling of a large panel of tumor specimens. To date, most genome-wide array comparative genomic hybridization studies have used tumor panels of relatively large tumor size and high Nottingham Prognostic Index (NPI) that are not as representative of breast cancer demographics.
Results
We performed an oligo-array-based high-resolution analysis of copy number alterations in 171 primary breast tumors of relatively small size and low NPI, which was therefore more representative of breast cancer demographics. Hierarchical clustering over the common regions of alteration identified a novel subtype of high-grade estrogen receptor (ER)-negative breast cancer, characterized by a low genomic instability index. We were able to validate the existence of this genomic subtype in one external breast cancer cohort. Using matched array expression data we also identified the genomic regions showing the strongest coordinate expression changes ('hotspots'). We show that several of these hotspots are located in the phosphatome, kinome and chromatinome, and harbor members of the 122-breast cancer CAN-list. Furthermore, we identify frequently amplified hotspots on 8q22.3 (EDD1, WDSOF1), 8q24.11-13 (THRAP6, DCC1, SQLE, SPG8) and 11q14.1 (NDUFC2, ALG8, USP35) associated with significantly worse prognosis. Amplification of any of these regions identified 37 samples with significantly worse overall survival (hazard ratio (HR) = 2.3 (1.3-1.4) p = 0.003) and time to distant metastasis (HR = 2.6 (1.4-5.1) p = 0.004) independently of NPI.
Conclusion
We present strong evidence for the existence of a novel subtype of high-grade ER-negative tumors that is characterized by a low genomic instability index. We also provide a genome-wide list of common copy number alteration regions in breast cancer that show strong coordinate aberrant expression, and further identify novel frequently amplified regions that correlate with poor prognosis. Many of the genes associated with these regions represent likely novel oncogenes or tumor suppressors.
doi:10.1186/gb-2007-8-10-r215
PMCID: PMC2246289  PMID: 17925008

Results 1-15 (15)