Search tips
Search criteria

Results 1-20 (20)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Contributions to drug resistance in glioblastoma derived from malignant cells in the sub-ependymal zone 
Cancer research  2014;75(1):194-202.
Glioblastoma (GB), the most common and aggressive adult brain tumor, is characterized by extreme phenotypic diversity and treatment failure. Through fluorescence-guided resection, we identified fluorescent tissue in the sub-ependymal zone (SEZ) of GB patients. Histological analysis and genomic characterization revealed that the SEZ harbors malignant cells with tumor-initiating capacity, analogous to cells isolated from the fluorescent tumor mass (T). We observed resistance to supra-maximal chemotherapy doses along with differential patterns of drug response between T and SEZ in the same tumor. Our results reveal novel insights into GB growth dynamics, with implications for understanding and limiting treatment resistance.
PMCID: PMC4286248  PMID: 25406193
2.  Molecular and neuronal homology between the olfactory systems of zebrafish and mouse 
Scientific Reports  2015;5:11487.
Studies of the two major olfactory organs of rodents, the olfactory mucosa (OM) and the vomeronasal organ (VNO), unraveled the molecular basis of smell in vertebrates. However, some vertebrates lack a VNO. Here we generated and analyzed the olfactory transcriptome of the zebrafish and compared it to the olfactory transcriptomes of mouse to investigate the evolutionary and molecular relationship between single and dual olfactory systems. Our analyses revealed a high degree of molecular conservation, with orthologs of mouse olfactory cell-specific markers and all but one of their chemosensory receptor classes expressed in the single zebrafish olfactory organ. Zebrafish chemosensory receptor genes are expressed across a large dynamic range and their RNA abundance correlates positively with the number of neurons expressing that RNA. Thus we estimate the relative proportions of neuronal sub-types expressing different chemosensory receptors. Receptor repertoire size drives the absolute abundance of different classes of neurons, but we find similar underlying patterns in both species. Finally, we identified novel marker genes that characterize rare neuronal populations in both mouse and zebrafish. In sum, we find that the molecular and cellular mechanisms underpinning olfaction in teleosts and mammals are similar despite 430 million years of evolutionary divergence.
PMCID: PMC4480006  PMID: 26108469
3.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data 
PLoS Computational Biology  2015;11(6):e1004333.
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
Author Summary
Gene expression signatures have historically been used to generate molecular fingerprints that characterise distinct tissues. Moreover, by interrogating these molecular signatures it has been possible to understand how a tissue’s function is regulated at the molecular level. However, even between cells from a seemingly homogeneous tissue sample, there exists substantial heterogeneity in gene expression levels. These differences might correspond to novel subtypes or to transient states linked, for example, to the cell cycle. Single-cell RNA-sequencing, where the transcriptomes of individual cells are profiled using next generation sequencing, provides a method for identifying genes that show more variation across cells than expected by chance, which might be characteristic of such populations. However, single-cell RNA-sequencing is subject to a high degree of technical noise, making it necessary to account for this to robustly identify such genes. To this end, we use a fully Bayesian approach that jointly models extrinsic spike-in molecules with genes from the cells of interest allowing better identification of such genes than previously described computational strategies. We validate our approach using data from mouse Embryonic Stem Cells.
PMCID: PMC4480965  PMID: 26107944
4.  Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways 
eLife  null;3:e02626.
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
eLife digest
Stretches of DNA called cis-regulatory modules (or CRMs for short) could help researchers to identify the regions of DNA that are most important for controlling genes. CRMs are regions where multiple transcription factors—proteins that control when and how genes are expressed—bind to DNA. As important biological pathways are often regulated by more than one transcription factor, CRMs are therefore a good target when looking for DNA regions that, if mutated, are likely to cause disease.
If a stretch of DNA performs an important role, it is often conserved throughout evolution. This is often observed for genes that make proteins. Indeed, DNA regions that specify critical amino acids that make up proteins are often conserved across distantly related species. However, unlike the changes made to the amino acid encoding parts of genes, it is currently a challenge to predict which changes in the rest of the genome will affect gene expression.
One reason for this challenge is that transcription factor binding sites are rapidly evolving. This rapid evolution means that strictly comparing DNA sequences between species may fail to identify where transcription factors like to bind in the genome. Numerous experimental efforts have therefore been made to map these sites. These have revealed that there are a huge number of regions in the human genome that can bind transcription factors: hundreds of thousands of sites, far more than there are genes. For this reason, there is a great interest in revealing which of these regulatory regions are critical for maintaining normal levels and timings of gene expression.
Ballester et al. compared the binding sites of four transcription factors responsible for regulating liver function in humans, macaques, mice, rats, and dogs. About two-thirds of these binding sites were found in CRMs. Less than half of the CRMs in humans were also CRMs in another species—but Ballester et al. found that these shared CRMs are predominantly in charge of regulating the essential biological pathways that allow the liver to function correctly. In addition, Ballester et al. identified several examples of disease-causing DNA mutations in shared CRMs that affected the expression of genes that make up pathways such as the blood clotting cascade. Genome-wide association studies also uncovered common variants for liver-related traits that were enriched for the CRMs found in more than one species, further supporting their importance.
As transcription factors work in different ways in different tissues, further studies are now required to expand these observations to organs other than the liver. Future work is also needed to investigate the function of thousands of conserved CRMs whose role in liver gene regulation remains unknown.
PMCID: PMC4359374  PMID: 25279814
cis regulatory module; transcription factors; molecular evolution; macaque; dog; liver; human; mouse; rat; other
5.  Random Monoallelic Gene Expression Increases upon Embryonic Stem Cell Differentiation 
Developmental cell  2014;28(4):351-365.
Random autosomal monoallelic gene expression refers to the transcription of a gene from one of two homologous alleles. We assessed the dynamics of monoallelic expression during development through an allele-specific RNA sequencing screen in clonal populations of hybrid mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs). We identified 67 and 376 inheritable autosomal random monoallelically expressed genes in ESCs and NPCs respectively, a 5.6-fold increase upon differentiation. While DNA methylation and nuclear positioning did not distinguish the active and inactive alleles, specific histone modifications were differentially enriched between the two alleles. Interestingly, expression levels of 8% of the monoallelically expressed genes remained similar between monoallelic and biallelic clones. These results support a model in which random monoallelic expression occurs stochastically during differentiation, and for some genes is compensated for by the cell to maintain the required transcriptional output of these genes.
PMCID: PMC3955261  PMID: 24576421
6.  Single Cell Genomics meeting in Stockholm: from single cells to cell types 
Genome Biology  2014;15(10):496.
A report on the second Single Cell Genomics conference held in Stockholm, Sweden, September 9–11, 2014.
PMCID: PMC4281946  PMID: 25418892
7.  Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments 
Nucleic Acids Research  2013;42(Database issue):D926-D932.
Expression Atlas ( is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of ‘baseline’ expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful ‘contrasts’, i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
PMCID: PMC3964963  PMID: 24304889
8.  Cooperativity and Rapid Evolution of Cobound Transcription Factors in Closely Related Mammals 
Cell  2013;154(3):530-540.
To mechanistically characterize the microevolutionary processes active in altering transcription factor (TF) binding among closely related mammals, we compared the genome-wide binding of three tissue-specific TFs that control liver gene expression in six rodents. Despite an overall fast turnover of TF binding locations between species, we identified thousands of TF regions of highly constrained TF binding intensity. Although individual mutations in bound sequence motifs can influence TF binding, most binding differences occur in the absence of nearby sequence variations. Instead, combinatorial binding was found to be significant for genetic and evolutionary stability; cobound TFs tend to disappear in concert and were sensitive to genetic knockout of partner TFs. The large, qualitative differences in genomic regions bound between closely related mammals, when contrasted with the smaller, quantitative TF binding differences among Drosophila species, illustrate how genome structure and population genetics together shape regulatory evolution.
Graphical Abstract
•Earliest steps of regulatory evolution in mammals captured using five mouse species•Interspecies differences in TF binding are rarely caused by DNA variation in motifs•Cobound TFs change their genomic binding cooperatively in closely related mammals•Genetic knockouts revealed the extent of cooperative stabilization in TF binding clusters
Microevolutionary mechanisms create different transcription factor binding patterns between mammals, shedding light on the regulatory mechanisms partially underlying speciation.
PMCID: PMC3732390  PMID: 23911320
9.  bioWeb3D: an online webGL 3D data visualisation tool 
BMC Bioinformatics  2013;14:185.
Data visualization is critical for interpreting biological data. However, in practice it can prove to be a bottleneck for non trained researchers; this is especially true for three dimensional (3D) data representation. Whilst existing software can provide all necessary functionalities to represent and manipulate biological 3D datasets, very few are easily accessible (browser based), cross platform and accessible to non-expert users.
An online HTML5/WebGL based 3D visualisation tool has been developed to allow biologists to quickly and easily view interactive and customizable three dimensional representations of their data along with multiple layers of information. Using the WebGL library Three.js written in Javascript, bioWeb3D allows the simultaneous visualisation of multiple large datasets inputted via a simple JSON, XML or CSV file, which can be read and analysed locally thanks to HTML5 capabilities.
Using basic 3D representation techniques in a technologically innovative context, we provide a program that is not intended to compete with professional 3D representation software, but that instead enables a quick and intuitive representation of reasonably large 3D datasets.
PMCID: PMC3710502  PMID: 23758781
10.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data 
Genome Biology  2013;14(1):R7.
Genetically identical populations of cells grown in the same environmental condition show substantial variability in gene expression profiles. Although single-cell RNA-seq provides an opportunity to explore this phenomenon, statistical methods need to be developed to interpret the variability of gene expression counts.
We develop a statistical framework for studying the kinetics of stochastic gene expression from single-cell RNA-seq data. By applying our model to a single-cell RNA-seq dataset generated by profiling mouse embryonic stem cells, we find that the inferred kinetic parameters are consistent with RNA polymerase II binding and chromatin modifications. Our results suggest that histone modifications affect transcriptional bursting by modulating both burst size and frequency. Furthermore, we show that our model can be used to identify genes with slow promoter kinetics, which are important for probabilistic differentiation of embryonic stem cells.
We conclude that the proposed statistical model provides a flexible and efficient way to investigate the kinetics of transcription.
PMCID: PMC3663116  PMID: 23360624
gene regulation; RNA-seq; single-cell; statistics; transcriptional burst
11.  Genomic-scale capture and sequencing of endogenous DNA from feces 
Molecular ecology  2010;19(24):5332-5344.
Genomic-level analyses of DNA from non-invasive sources would facilitate powerful conservation and evolutionary studies in natural populations of endangered and otherwise elusive species. However, the typical low quantity and poor quality of DNA that is extracted from non-invasive samples have generally precluded such work. Here we apply a modified DNA capture protocol that, when used in combination with massively-parallel sequencing technology, facilitates efficient and highly-accurate resequencing of megabases of specified nuclear genomic regions from fecal DNA samples. We validated our approach by comparing genetic variants identified from corresponding fecal and blood DNA samples of six western chimpanzees (Pan troglodytes verus) across more than 1.5 megabases of chromosome 21, chromosome X, and the complete mitochondrial genome. Our results suggest that it is now feasible to conduct genomic studies in natural populations for which constraints on invasive sampling have otherwise long been a barrier. The data we collected also provided an opportunity to examine western chimpanzee genetic diversity at unprecedented scale. Despite high mitochondrial genome diversity (π = 0.585%), western chimpanzees have a low ratio (0.42) of X chromosomal (π = 0.034%) to autosomal (chromosome 21 π = 0.081%) sequence diversity, a pattern that may reflect an unusual demographic history of this subspecies.
PMCID: PMC2998560  PMID: 21054605
molecular ecology; population genetics; non-invasive sampling; conservation genomics
12.  Understanding mechanisms underlying human gene expression variation with RNA sequencing 
Nature  2010;464(7289):768-772.
Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project2. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.
PMCID: PMC3089435  PMID: 20220758
13.  A Genome-Wide Study of DNA Methylation Patterns and Gene Expression Levels in Multiple Human and Chimpanzee Tissues 
PLoS Genetics  2011;7(2):e1001316.
The modification of DNA by methylation is an important epigenetic mechanism that affects the spatial and temporal regulation of gene expression. Methylation patterns have been described in many contexts within and across a range of species. However, the extent to which changes in methylation might underlie inter-species differences in gene regulation, in particular between humans and other primates, has not yet been studied. To this end, we studied DNA methylation patterns in livers, hearts, and kidneys from multiple humans and chimpanzees, using tissue samples for which genome-wide gene expression data were also available. Using the multi-species gene expression and methylation data for 7,723 genes, we were able to study the role of promoter DNA methylation in the evolution of gene regulation across tissues and species. We found that inter-tissue methylation patterns are often conserved between humans and chimpanzees. However, we also found a large number of gene expression differences between species that might be explained, at least in part, by corresponding differences in methylation levels. In particular, we estimate that, in the tissues we studied, inter-species differences in promoter methylation might underlie as much as 12%–18% of differences in gene expression levels between humans and chimpanzees.
Author Summary
It has long been hypothesized that changes in gene regulation have played an important role in primate evolution. However, despite the wealth of comparative gene expression data, there are still only few studies that focus on the mechanisms underlying inter-primate differences in gene regulation. In particular, we know relatively little about the degree to which changes in epigenetic profiles might explain differences in gene expression levels between primates. To this end, we studied DNA methylation and gene expression levels in livers, hearts, and kidneys from multiple humans and chimpanzees. Using these comparative data, we were able to study the evolution of gene regulation in the context of conservation of or changes in DNA methylation profiles across tissues and species. We found that inter-tissue methylation patterns are often conserved between humans and chimpanzees. In addition, we also found a large number of gene expression differences between species, which might be explained, at least in part, by corresponding differences in methylation levels. We estimate that, in the tissues we studied, inter-species differences in methylation levels might underlie as much as 12%–18% of differences in gene expression levels between humans and chimpanzees.
PMCID: PMC3044686  PMID: 21383968
14.  Functional Comparison of Innate Immune Signaling Pathways in Primates 
PLoS Genetics  2010;6(12):e1001249.
Humans respond differently than other primates to a large number of infections. Differences in susceptibility to infectious agents between humans and other primates are probably due to inter-species differences in immune response to infection. Consistent with that notion, genes involved in immunity-related processes are strongly enriched among recent targets of positive selection in primates, suggesting that immune responses evolve rapidly, yet providing only indirect evidence for possible inter-species functional differences. To directly compare immune responses among primates, we stimulated primary monocytes from humans, chimpanzees, and rhesus macaques with lipopolysaccharide (LPS) and studied the ensuing time-course regulatory responses. We find that, while the universal Toll-like receptor response is mostly conserved across primates, the regulatory response associated with viral infections is often lineage-specific, probably reflecting rapid host–virus mutual adaptation cycles. Additionally, human-specific immune responses are enriched for genes involved in apoptosis, as well as for genes associated with cancer and with susceptibility to infectious diseases or immune-related disorders. Finally, we find that chimpanzee-specific immune signaling pathways are enriched for HIV–interacting genes. Put together, our observations lend strong support to the notion that lineage-specific immune responses may help explain known inter-species differences in susceptibility to infectious diseases.
Author Summary
We know of a large number of diseases or medical conditions that affect humans more severely than non-human primates, such as AIDS, malaria, hepatitis B, and cancer. These differences likely arise from different immune responses to infection among species. However, due to the lack of comparative functional data across species, it remains unclear how the immune system of humans and other primates differ. In this work, we present the first genome-wide characterization of functional differences in innate immune responses between humans and our closest evolutionary relatives. Our results indicate that “core” immune responses, those that are critical to fight any invading pathogen, are the most conserved across primates and that much of the divergence in immune responses is observed in genes that are involved in response to specific microbial and viral agents. In addition, we show that human-specific immune responses are enriched for genes involved in apoptosis and cancer biology, as well as with genes previously associated with susceptibility to infectious diseases or immune-related disorders. Finally, we find that chimpanzee-specific immune signaling pathways are enriched for HIV–interacting genes. Our observations may therefore help explain known inter-species differences in susceptibility to infectious diseases.
PMCID: PMC3002988  PMID: 21187902
16.  The pitfalls of platform comparison: DNA copy number array technologies assessed 
BMC Genomics  2009;10:588.
The accurate and high resolution mapping of DNA copy number aberrations has become an important tool by which to gain insight into the mechanisms of tumourigenesis. There are various commercially available platforms for such studies, but there remains no general consensus as to the optimal platform. There have been several previous platform comparison studies, but they have either described older technologies, used less-complex samples, or have not addressed the issue of the inherent biases in such comparisons. Here we describe a systematic comparison of data from four leading microarray technologies (the Affymetrix Genome-wide SNP 5.0 array, Agilent High-Density CGH Human 244A array, Illumina HumanCNV370-Duo DNA Analysis BeadChip, and the Nimblegen 385 K oligonucleotide array). We compare samples derived from primary breast tumours and their corresponding matched normals, well-established cancer cell lines, and HapMap individuals. By careful consideration and avoidance of potential sources of bias, we aim to provide a fair assessment of platform performance.
By performing a theoretical assessment of the reproducibility, noise, and sensitivity of each platform, notable differences were revealed. Nimblegen exhibited between-replicate array variances an order of magnitude greater than the other three platforms, with Agilent slightly outperforming the others, and a comparison of self-self hybridizations revealed similar patterns. An assessment of the single probe power revealed that Agilent exhibits the highest sensitivity. Additionally, we performed an in-depth visual assessment of the ability of each platform to detect aberrations of varying sizes. As expected, all platforms were able to identify large aberrations in a robust manner. However, some focal amplifications and deletions were only detected in a subset of the platforms.
Although there are substantial differences in the design, density, and number of replicate probes, the comparison indicates a generally high level of concordance between platforms, despite differences in the reproducibility, noise, and sensitivity. In general, Agilent tended to be the best aCGH platform and Affymetrix, the superior SNP-CGH platform, but for specific decisions the results described herein provide a guide for platform selection and study design, and the dataset a resource for more tailored comparisons.
PMCID: PMC2797821  PMID: 19995423
17.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data 
Bioinformatics  2009;25(24):3207-3212.
Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).
Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data.
Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO ( under accession number GSE18156.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2788925  PMID: 19808877
18.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
PMCID: PMC2644410  PMID: 18612301
19.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization 
Genome Biology  2007;8(10):R228.
Datasets used for detecting copy number variation (CNV) are shown to be affected by a technical artifact. A novel CNV calling algorithm is presented which removes this artifact and identifies regions of CNV better than existing methods.
Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.
We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.
Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
PMCID: PMC2246302  PMID: 17961237
20.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer 
Genome Biology  2007;8(10):R215.
High resolution array-CGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer, and provides a genome-wide list of common copy number alterations associated with aberrant expression and poor prognosis.
The characterization of copy number alteration patterns in breast cancer requires high-resolution genome-wide profiling of a large panel of tumor specimens. To date, most genome-wide array comparative genomic hybridization studies have used tumor panels of relatively large tumor size and high Nottingham Prognostic Index (NPI) that are not as representative of breast cancer demographics.
We performed an oligo-array-based high-resolution analysis of copy number alterations in 171 primary breast tumors of relatively small size and low NPI, which was therefore more representative of breast cancer demographics. Hierarchical clustering over the common regions of alteration identified a novel subtype of high-grade estrogen receptor (ER)-negative breast cancer, characterized by a low genomic instability index. We were able to validate the existence of this genomic subtype in one external breast cancer cohort. Using matched array expression data we also identified the genomic regions showing the strongest coordinate expression changes ('hotspots'). We show that several of these hotspots are located in the phosphatome, kinome and chromatinome, and harbor members of the 122-breast cancer CAN-list. Furthermore, we identify frequently amplified hotspots on 8q22.3 (EDD1, WDSOF1), 8q24.11-13 (THRAP6, DCC1, SQLE, SPG8) and 11q14.1 (NDUFC2, ALG8, USP35) associated with significantly worse prognosis. Amplification of any of these regions identified 37 samples with significantly worse overall survival (hazard ratio (HR) = 2.3 (1.3-1.4) p = 0.003) and time to distant metastasis (HR = 2.6 (1.4-5.1) p = 0.004) independently of NPI.
We present strong evidence for the existence of a novel subtype of high-grade ER-negative tumors that is characterized by a low genomic instability index. We also provide a genome-wide list of common copy number alteration regions in breast cancer that show strong coordinate aberrant expression, and further identify novel frequently amplified regions that correlate with poor prognosis. Many of the genes associated with these regions represent likely novel oncogenes or tumor suppressors.
PMCID: PMC2246289  PMID: 17925008

Results 1-20 (20)