Cancer genome sequencing studies have identified numerous driver genes but the relative timing of mutations in carcinogenesis remains unclear. The gradual progression from pre-malignant Barrett’s esophagus to esophageal adenocarcinoma (EAC) provides an ideal model to study the ordering of somatic mutations. We identified recurrently-mutated genes and assessed clonal structure using whole-genome sequencing and amplicon-resequencing of 112 EACs. We next screened a cohort of 109 biopsies from two key transition points in the development of malignancy; benign metaplastic never-dysplastic Barrett’s esophagus (NDBE, n=66), and high-grade dysplasia (HGD, n=43). Unexpectedly, the majority of recurrently mutated genes in EAC were also mutated in NDBE. Only TP53 and SMAD4 were stage-specific, confined to HGD and EAC, respectively. Finally, we applied this knowledge to identify high-risk Barrett’s esophagus in a novel non-endoscopic test. In conclusion, mutations in EAC driver genes generally occur exceptionally early in disease development with profound implications for diagnostic and therapeutic strategies.
ICGC; esophageal cancer; Barrett’s esophagus; whole genome sequencing
There is emerging evidence that Wnt pathway activity may increase during the progression from colorectal adenoma to carcinoma and that this increase is potentially an important step towards the invasive stage. Here, we investigated whether epigenetic silencing of Wnt antagonists is the biological driver for this increased Wnt activity in human tissues and how these methylation changes correlate with MSI (Microsatelite Instability) and CIMP (CpG Island Methylator Phenotype) statuses as well as known mutations in genes driving colorectal neoplasia.
We conducted a systematic analysis by pyrosequencing, to determine the promoter methylation of CpG islands associated with 17 Wnt signaling component genes. Methylation levels were correlated with MSI and CIMP statuses and known mutations within the APC, BRAF and KRAS genes in 264 matched samples representing the progression from normal to pre-invasive adenoma to colorectal carcinoma.
We discovered widespread hypermethylation of the Wnt antagonists SFRP1, SFRP2, SFRP5, DKK2, WIF1 and SOX17 in the transition from normal to adenoma with only the Wnt antagonists SFRP1, SFRP2, DKK2 and WIF1 showing further significant increase in methylation from adenoma to carcinoma. We show this to be accompanied by loss of expression of these Wnt antagonists, and by an increase in nuclear Wnt pathway activity. Mixed effects models revealed that mutations in APC, BRAF and KRAS occur at the transition from normal to adenoma stages whilst the hypermethylation of the Wnt antagonists continued to accumulate during the transitions from adenoma to carcinoma stages.
Our study provides strong evidence for a correlation between progressive hypermethylation and silencing of several Wnt antagonists with stepping-up in Wnt pathway activity beyond the APC loss associated tumour-initiating Wnt signalling levels.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2407-14-891) contains supplementary material, which is available to authorized users.
Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) is a valuable tool for epigenetic studies. Analysis of the data arising from ChIP-seq experiments often requires implicit or explicit statistical modeling of the read counts. The simple Poisson model is attractive, but does not provide a good fit to observed ChIP-seq data. Researchers therefore often either extend to a more general model (e.g., the Negative Binomial), and/or exclude regions of the genome that do not conform to the model. Since many modeling strategies employed for ChIP-seq data reduce to fitting a mixture of Poisson distributions, we explore the problem of inferring the optimal mixing distribution. We apply the Constrained Newton Method (CNM), which suggests the Negative Binomial - Negative Binomial (NB-NB) mixture model as a candidate for modeling ChIP-seq data. We illustrate fitting the NB-NB model with an accelerated EM algorithm on four data sets from three species. Zero-inflated models have been suggested as an approach to improve model fit for ChIP-seq data. We show that the NB-NB mixture model requires no zero-inflation and suggest that in some cases the need for zero inflation is driven by the model's inability to cope with both artifactual large read counts and the frequently observed very low read counts. We see that the CNM-based approach is a useful diagnostic for the assessment of model fit and inference in ChIP-seq data and beyond. Use of the suggested NB-NB mixture model will be of value not only when calling peaks or otherwise modeling ChIP-seq data, but also when simulating data or constructing blacklists de novo.
ChIP-seq; Negative Binomial; mixture model; zero-inflation; high-throughput sequencing
Population geneticists can reconstruct the ancestries of macroscopic populations from polymorphisms in present day individuals. For example, the migration “out of Africa” is recorded in human genome variation in different parts of the world. Here we apply this approach to human colorectal cancer cell populations and polymorphic passenger methylation patterns. By sampling molecular variation from different parts of the same cancer, it should be possible to infer how individual tumors grow because recent clonal expansions should be less diverse than older expansions. Average diversity was different between cancers implying that some cancers are older clonal expansions than others. For individual cancers, methylation pattern diversity was relatively uniform throughout the tumor (right versus left side, superficial versus invasive), which is more consistent with a single, uniform or “flat” clonal expansion than with stepwise sequential progression. Many colorectal cancers appear to invade and expand early, but subsequently stall. Epiallele diversity within individual small cancer gland fragments was high and more consistent with frequent rather than extremely rare cancer stem cells (CSCs). These studies suggest that many human colorectal cancers are relatively old uniform clonal expansions, that cancer cell populations contain frequent long-lived CSC lineages, and that some passenger methylation patterns record somatic cell ancestry.
tumor progression; clonal evolution; cancer stem cells; methylation; gompertzian
Forkhead box (FOX) transcription factors regulate a wide variety of cellular functions in higher eukaryotes, including cell cycle control and developmental regulation. In Saccharomyces cerevisiae, Forkhead proteins Fkh1 and Fkh2 perform analogous functions, regulating genes involved in cell cycle control, while also regulating mating-type silencing and switching involved in gamete development. Recently, we revealed a novel role for Fkh1 and Fkh2 in the regulation of replication origin initiation timing, which, like donor preference in mating-type switching, appears to involve long-range chromosomal interactions, suggesting roles for Fkh1 and Fkh2 in chromatin architecture and organization. To elucidate how Fkh1 and Fkh2 regulate their target DNA elements and potentially regulate the spatial organization of the genome, we undertook a genome-wide analysis of Fkh1 and Fkh2 chromatin binding by ChIP-chip using tiling DNA microarrays. Our results confirm and extend previous findings showing that Fkh1 and Fkh2 control the expression of cell cycle-regulated genes. In addition, the data reveal hundreds of novel loci that bind Fkh1 only and exhibit a distinct chromatin structure from loci that bind both Fkh1 and Fkh2. The findings also show that Fkh1 plays the predominant role in the regulation of a subset of replication origins that initiate replication early, and that Fkh1/2 binding to these loci is cell cycle-regulated. Finally, we demonstrate that Fkh1 and Fkh2 bind proximally to a variety of genetic elements, including centromeres and Pol III-transcribed snoRNAs and tRNAs, greatly expanding their potential repertoire of functional targets, consistent with their recently suggested role in mediating the spatial organization of the genome.
The increasing interest in the investigation of social behaviours of a group of animals has heightened the need for developing tools that provide robust quantitative data. Drosophila melanogaster has emerged as an attractive model for behavioural analysis; however, there are still limited ways to monitor fly behaviour in a quantitative manner. To study social behaviour of a group of flies, acquiring the position of each individual over time is crucial. There are several studies that have tried to solve this problem and make this data acquisition automated. However, none of these studies has addressed the problem of keeping track of flies for a long period of time in three-dimensional space. Recently, we have developed an approach that enables us to detect and keep track of multiple flies in a three-dimensional arena for a long period of time, using multiple synchronized and calibrated cameras. After detecting flies in each view, correspondence between views is established using a novel approach we call the ‘sequential Hungarian algorithm’. Subsequently, the three-dimensional positions of flies in space are reconstructed. We use the Hungarian algorithm and Kalman filter together for data association and tracking. We evaluated rigorously the system's performance for tracking and behaviour detection in multiple experiments, using from one to seven flies. Overall, this system presents a powerful new method for studying complex social interactions in a three-dimensional environment.
three-dimensional reconstruction; featureless multi-target tracking; Drosophila behaviour monitoring; social behaviour studies
Cells with reduced origin firing have an increased rate of replication fork progression, whereas fork progression is slowed in cells with excess origins.
DNA damage slows DNA synthesis at replication forks; however, the mechanisms remain unclear. Cdc7 kinase is required for replication origin activation, is a target of the intra-S checkpoint, and is implicated in the response to replication fork stress. Remarkably, we found that replication forks proceed more rapidly in cells lacking Cdc7 function than in wild-type cells. We traced this effect to reduced origin firing, which results in fewer replication forks and a consequent decrease in Rad53 checkpoint signaling. Depletion of Orc1, which acts in origin firing differently than Cdc7, had similar effects as Cdc7 depletion, consistent with decreased origin firing being the source of these defects. In contrast, mec1-100 cells, which initiate excess origins and also are deficient in checkpoint activation, showed slower fork progression, suggesting the number of active forks influences their rate, perhaps as a result of competition for limiting factors.
Homologous sets of transcription factors direct conserved tissue-specific gene expression, yet transcription factor binding events diverge rapidly between closely related species. We used hepatocytes from an aneuploid mouse strain carrying human chromosome 21 to determine on a chromosomal scale whether interspecies differences in transcriptional regulation are primarily directed by human genetic sequence or mouse nuclear environment. Virtually all transcription factor binding locations, landmarks of transcription initiation, and the resulting gene expression observed in human hepatocytes were recapitulated across the entire human chromosome 21 in the mouse hepatocyte nucleus. Thus, in homologous tissues, genetic sequence is largely responsible for directing transcriptional programs; interspecies differences in epigenetic machinery, cellular environment, and transcription factors themselves play secondary roles.
The expansion of repressive epigenetic marks has been implicated in heterochromatin formation during embryonic development, but the general applicability of this mechanism is unclear. Here we show that nuclear rearrangement of repressive histone marks H3K9me3 and H3K27me3 into nonoverlapping structural layers characterizes senescence-associated heterochromatic foci (SAHF) formation in human fibroblasts. However, the global landscape of these repressive marks remains unchanged upon SAHF formation, suggesting that in somatic cells, heterochromatin can be formed through the spatial repositioning of pre-existing repressively marked histones. This model is reinforced by the correlation of presenescent replication timing with both the subsequent layered structure of SAHFs and the global landscape of the repressive marks, allowing us to integrate microscopic and genomic information. Furthermore, modulation of SAHF structure does not affect the occupancy of these repressive marks, nor vice versa. These experiments reveal that high-order heterochromatin formation and epigenetic remodeling of the genome can be discrete events.
Substantial evidence supports the concept that cancers are organized in a cellular hierarchy with cancer stem cells (CSCs) at the apex. To date, the primary evidence for CSCs derives from transplantation assays, which have known limitations. In particular, they are unable to report on the fate of cells within the original human tumor. Due to the difficulty in measuring tumor characteristics in patients, cellular organization and other aspects of cancer dynamics have not been quantified directly, although they likely play a fundamental role in tumor progression and therapy response. As such, new approaches to study CSCs in patient-derived tumor specimens are needed. In this study we exploited ultra-deep single-molecule genomic data derived from multiple microdissected colorectal cancer glands per tumor, along with a novel quantitative approach to measure tumor characteristics, define patient-specific tumor profiles, and infer tumor ancestral trees. We demonstrate that each cancer is unique in terms of its cellular organization, molecular heterogeneity, time from malignant transformation, and rate of mutation and apoptosis. Importantly, we estimate CSC fractions between 0.5% and 4%, indicative of a hierarchical organization responsible for long-lived CSC lineages, with variable rates of symmetric cell division. We also observed extensive molecular heterogeneity, both between and within individual cancer glands, suggesting a complex hierarchy of mitotic clones. Our framework enables the measurement of clinically relevant patient-specific characteristics in vivo, providing insight into the cellular organization and dynamics of tumor growth, with implications for personalized patient care.
Many tumors have highly rearranged genomes, but a major unknown is the relative importance and timing of genome rearrangements compared to sequence-level mutation. Chromosome instability might arise early, be a late event contributing little to cancer development, or happen as a single catastrophic event. Another unknown is which of the point mutations and rearrangements are selected. To address these questions we show, using the breast cancer cell line HCC1187 as a model, that we can reconstruct the likely history of a breast cancer genome. We assembled probably the most complete map to date of a cancer genome, by combining molecular cytogenetic analysis with sequence data. In particular, we assigned most sequence-level mutations to individual chromosomes by sequencing of flow sorted chromosomes. The parent of origin of each chromosome was assigned from SNP arrays. We were then able to classify most of the mutations as earlier or later according to whether they occurred before or after a landmark event in the evolution of the genome, endoreduplication (duplication of its entire genome). Genome rearrangements and sequence-level mutations were fairly evenly divided earlier and later, suggesting that genetic instability was relatively constant throughout the life of this tumor, and chromosome instability was not a late event. Mutations that caused chromosome instability would be in the earlier set. Strikingly, the great majority of inactivating mutations and in-frame gene fusions happened earlier. The non-random timing of some of the mutations may be evidence that they were selected.
The replication of eukaryotic chromosomes is organized temporally and spatially within the nucleus through epigenetic regulation of replication origin function. The characteristic initiation timing of specific origins is thought to reflect their chromatin environment or sub-nuclear positioning, however the mechanism remains obscure. Here we show that the yeast Forkhead transcription factors, Fkh1 and Fkh2, are global determinants of replication origin timing. Forkhead regulation of origin timing is independent of local levels or changes of transcription. Instead, we show that Fkh1 and Fkh2 are required for the clustering of early origins and their association with the key initiation factor Cdc45 in G1-phase, suggesting that Fkh1 and Fkh2 selectively recruit origins to emergent replication factories. Fkh1 and Fkh2 bind Fkh-activated origins, and interact physically with ORC, providing a plausible mechanism to cluster origins. These findings add a new dimension to our understanding of the epigenetic basis for differential origin regulation and its connection to chromosomal domain organization.
Replication origin timing; chromatin; Forkhead; Fox; centromere; telomere; chromosome-conformation; Cdc45; epigenetics; transcription; nuclear architecture
Dynamic activity of signaling pathways, such as Notch, is vital to achieve correct development and homeostasis. However, most studies assess output many hours or days after initiation of signaling, once the outcome has been consolidated. Here we analyze genome-wide changes in transcript levels, binding of the Notch pathway transcription factor, CSL [Suppressor of Hairless, Su(H), in Drosophila], and RNA Polymerase II (Pol II) immediately following a short pulse of Notch stimulation. A total of 154 genes showed significant differential expression (DE) over time, and their expression profiles stratified into 14 clusters based on the timing, magnitude, and direction of DE. E(spl) genes were the most rapidly upregulated, with Su(H), Pol II, and transcript levels increasing within 5–10 minutes. Other genes had a more delayed response, the timing of which was largely unaffected by more prolonged Notch activation. Neither Su(H) binding nor poised Pol II could fully explain the differences between profiles. Instead, our data indicate that regulatory interactions, driven by the early-responding E(spl)bHLH genes, are required. Proposed cross-regulatory relationships were validated in vivo and in cell culture, supporting the view that feed-forward repression by E(spl)bHLH/Hes shapes the response of late-responding genes. Based on these data, we propose a model in which Hes genes are responsible for co-ordinating the Notch response of a wide spectrum of other targets, explaining the critical functions these key regulators play in many developmental and disease contexts.
Signaling via the Notch pathway conveys important information that helps to shape tissues and, when misused, contributes to diseases. Cells respond to the Notch signal by changing which genes are transcribed. Most previous studies have looked at changes in gene activity at a single time point, long after the start of signaling. By looking at carefully timed intervals immediately after Notch pathway activation, we have been able to follow the dynamic changes in transcription of all the genes and have found that they exhibit different patterns of activity. For example, activity of some genes, especially a previously characterised family called the E(spl) genes, starts very early, whereas others show more delayed upregulation. Our investigations into the underlying mechanisms reveal that cross-regulatory interactions driven by the early genes are required to shape the timing of the delayed response. This feed-forward mechanism is important because it explains why the E(spl)/Hes genes can play such a pivotal role in the Notch response, despite the fact that many other genes are regulated by the signal, a finding that will be valuable for understanding the contribution of E(spl)/Hes genes in diseases associated with altered Notch.
The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
Phylogeographic methods have attracted a lot of attention in recent years, stressing the need to provide a solid statistical framework for many existing methodologies so as to draw statistically reliable inferences. Here, we take a flexible fully Bayesian approach by reducing the problem to a clustering framework, whereby the population distribution can be explained by a set of migrations, forming geographically stable population clusters. These clusters are such that they are consistent with a fixed number of migrations on the corresponding (unknown) subdivided coalescent tree. Our methods rely upon a clustered population distribution, and allow for inclusion of various covariates (such as phenotype or climate information) at little additional computational cost. We illustrate our methods with an example from weevil mitochondrial DNA sequences from the Iberian peninsula.
migration; coalescent; subdivided population; island model; Markov chain Monte Carlo; reversible jump
Protein synthesis and autophagic degradation are regulated in an opposite manner by mammalian target of rapamycin (mTOR), whereas under certain conditions it would be beneficial if they occured in unison to handle rapid protein turnover. We observed a distinct cellular compartment at the trans-side of the Golgi apparatus, the ‘TOR-autophagy spatial coupling compartment’ (TASCC), where (auto)lysosomes and mTOR accumulated during Ras-induced senescence. mTOR recruitment to the TASCC was amino acid- and Rag guanosine triphosphatase (GTPase)-dependent, and disruption of mTOR localization to the TASCC suppressed interleukin-6/8 synthesis. TASCC-formation was observed during macrophage differentiation and in glomerular podocytes; both displayed increased protein secretion. The spatial coupling of cells’ catabolic and anabolic machinery could augment their respective functions and facilitate the mass synthesis of secretory proteins.
Sample tracking errors have been and always will be a part of the practical implementation of large experiments. It has recently been proposed that expression quantitative trait loci (eQTLs) and their associated effects could be used to identify sample mix-ups and this approach has been applied to a number of large population genomics studies to illustrate the prevalence of the problem. We had adopted a similar approach, termed ‘BADGER’, in the METABRIC project. METABRIC is a large breast cancer study that may have been the first in which eQTL-based detection of mismatches was used during the study, rather than after the event, to aid quality assurance. We report here on the particular issues associated with large cancer studies performed using historical samples, which complicate the interpretation of such approaches. In particular we identify the complications of using tumour samples, of considering cellularity and RNA quality, of distinct subgroups existing in the study population (including family structures), and of choosing eQTLs to use. We also present some results regarding the design of experiments given consideration of these matters. The eQTL-based approach to identifying sample tracking errors is seen to be of value to these studies, but requiring care in its implementation.
In vivo imaging and quantification of fluorescent reporter molecules is increasingly useful in biomedical research. For example, tracking animal movement in 3D with simultaneous quantification of fluorescent transgenic reporters allows for correlations between behavior, aging and gene expression. However implementation has been hindered in the past by the complexity of operating the systems.
We report significant technical improvements and user-friendly software (called FluoreScore) that enables tracking of 3D movement and the dynamics of gene expression in adult Drosophila, using two cameras and recorded GFP videos. Expression of a transgenic construct encoding eGFP was induced in free-moving adult flies using the Gene-Switch system and RU486 drug feeding. The time course of induction of eGFP expression was readily quantified from internal tissues including central nervous tissue.
FluoreScore should facilitate a variety of future studies involving quantification of movement behaviors and fluorescent molecules in free-moving animals.
The Piwi proteins of the Argonaute superfamily are required for normal germline development in Drosophila, zebrafish and mice, and associate with 24-30 nucleotide RNAs termed piRNAs. We identify a class of 21 nucleotide RNAs, previously named 21U-RNAs, as the piRNAs of C. elegans. Piwi and piRNA expression is restricted to the male and female germline and independent of many proteins in other small RNA pathways, including DCR-1. We show that Piwi is specifically required to silence Tc3, but not other Tc/mariner DNA transposons. Tc3 excision rates in the germline are increased at least 100 fold in piwi mutants as compared to wild type. We find no evidence for a Ping-Pong model for piRNA amplification in C. elegans. Instead, we demonstrate that Piwi acts upstream of an endogenous siRNA pathway in Tc3 silencing. These data might suggest a link between piRNA and siRNA function.
Estimation of divergence times is usually done using either the fossil record or sequence data from modern species. We provide an integrated analysis of palaeontological and molecular data to give estimates of primate divergence times that utilize both sources of information. The number of preserved primate species discovered in the fossil record, along with their geological age distribution, is combined with the number of extant primate species to provide initial estimates of the primate and anthropoid divergence times. This is done by using a stochastic forwards-modeling approach where speciation and fossil preservation and discovery are simulated forward in time. We use the posterior distribution from the fossil analysis as a prior distribution on node ages in a molecular analysis. Sequence data from two genomic regions (CFTR on human chromosome 7 and the CYP7A1 region on chromosome 8) from 15 primate species are used with the birth–death model implemented in mcmctree in PAML to infer the posterior distribution of the ages of 14 nodes in the primate tree. We find that these age estimates are older than previously reported dates for all but one of these nodes. To perform the inference, a new approximate Bayesian computation (ABC) algorithm is introduced, where the structure of the model can be exploited in an ABC-within-Gibbs algorithm to provide a more efficient analysis.
Approximate Bayesian computation; molecular phylogeny; palaeontological data; primate divergence
It is possible to infer the past of populations by comparing genomes between individuals. In general, older populations have more genomic diversity than younger populations. The force of selection can also be inferred from population diversity. If selection is strong and frequently eliminates less fit variants, diversity will be limited because new, initially homogeneous populations constantly emerge.
Methodology and Results
Here we translate a population genetics approach to human somatic cancer cell populations by measuring genomic diversity within and between small colorectal cancer (CRC) glands. Control tissue culture and xenograft experiments demonstrate that the population diversity of certain passenger DNA methylation patterns is reduced after cloning but subsequently increases with time. When measured in CRC gland populations, passenger methylation diversity from different parts of nine CRCs was relatively high and uniform, consistent with older, stable lineages rather than mixtures of younger homogeneous populations arising from frequent cycles of selection. The diversity of six metastases was also high, suggesting dissemination early after transformation. Diversity was lower in DNA mismatch repair deficient CRC glands, possibly suggesting more selection and the elimination of less fit variants when mutation rates are elevated.
The many hitchhiking passenger variants observed in primary and metastatic CRC cell populations are consistent with relatively old populations, suggesting that clonal evolution leading to selective sweeps may be rare after transformation. Selection in human cancers appears to be a weaker than presumed force after transformation, consistent with the observed rarity of driver mutations in cancer genomes. Phenotypic plasticity rather than the stepwise acquisition of new driver mutations may better account for the many different phenotypes within human tumors.
The cancer stem cell (CSC) concept is a highly debated topic in cancer research.
While experimental evidence in favor of the cancer stem cell theory is
apparently abundant, the results are often criticized as being difficult to
interpret. An important reason for this is that most experimental data that
support this model rely on transplantation studies. In this study we use a novel
cellular Potts model to elucidate the dynamics of established malignancies that
are driven by a small subset of CSCs. Our results demonstrate that epigenetic
mutations that occur during mitosis display highly altered dynamics in
CSC-driven malignancies compared to a classical, non-hierarchical model of
growth. In particular, the heterogeneity observed in CSC-driven tumors is
considerably higher. We speculate that this feature could be used in combination
with epigenetic (methylation) sequencing studies of human malignancies to prove
or refute the CSC hypothesis in established tumors without the need for
transplantation. Moreover our tumor growth simulations indicate that CSC-driven
tumors display evolutionary features that can be considered beneficial during
tumor progression. Besides an increased heterogeneity they also exhibit
properties that allow the escape of clones from local fitness peaks. This leads
to more aggressive phenotypes in the long run and makes the neoplasm more
adaptable to stringent selective forces such as cancer treatment. Indeed when
therapy is applied the clone landscape of the regrown tumor is more aggressive
with respect to the primary tumor, whereas the classical model demonstrated
similar patterns before and after therapy. Understanding these often
counter-intuitive fundamental properties of (non-)hierarchically organized
malignancies is a crucial step in validating the CSC concept as well as
providing insight into the therapeutical consequences of this model.
Cancer is in essence a genetic disease that leads to uncontrolled cell
proliferation, invasion and metastasis. The cancer stem cell (CSC) hypothesis
states that tumors are not just a mass of uniform malignant cells but they are
hierarchically organized, like normal tissues. At the top of such a hierarchy
are cancer stem cells that fuel tumor growth in the long run, whereas the
majority of other cells are able to divide only a few times. The experiments
that support the CSC hypothesis are often criticized as being difficult to
interpret. A novel approach to test the CSC paradigm is to integrate
mathematical modeling with DNA variation data that carry the phylogenetic
history of cells. We have developed a model that simulates the occurrence of
such changes under both the CSC hypothesis and the classical, purely stochastic
scenario. We found that although a CSC-driven tumor has a smaller number of
tumorigenic cells, it triggers more malignant properties such as invasive
growth, heterogeneity and evolutionary escape from peaks in the fitness
landscape. These properties, that are unique to the CSC model, are enhanced even
further when a treatment is applied to the tumor.
Motivation: Identification of genomic regions of interest in ChIP-seq data, commonly referred to as peak-calling, aims to find the locations of transcription factor binding sites, modified histones or nucleosomes. The BayesPeak algorithm was developed to model the data structure using Bayesian statistical techniques and was shown to be a reliable method, but did not have a full-genome implementation.
Results: In this note we present BayesPeak, an R package for genome-wide peak-calling that provides a flexible implementation of the BayesPeak algorithm and is compatible with downstream BioConductor packages. The BayesPeak package introduces a new method for summarizing posterior probability output, along with methods for handling overfitting and support for parallel processing. We briefly compare the package with other common peak-callers.
Availability: Available as part of BioConductor version 2.6. URL: http://bioconductor.org/packages/release/bioc/html/BayesPeak.html
Supplementary information: Supplementary data are available at Bioinformatics online.
The demands of microarray expression technologies for quantities of RNA place a limit on the questions they can address. As a consequence, the RNA requirements have reduced over time as technologies have improved. In this paper we investigate the costs of reducing the starting quantity of RNA for the Illumina BeadArray platform. This we do via a dilution data set generated from two reference RNA sources that have become the standard for investigations into microarray and sequencing technologies.
We find that the starting quantity of RNA has an effect on observed intensities despite the fact that the quantity of cRNA being hybridized remains constant. We see a loss of sensitivity when using lower quantities of RNA, but no great rise in the false positive rate. Even with 10 ng of starting RNA, the positive results are reliable although many differentially expressed genes are missed. We see that there is some scope for combining data from samples that have contributed differing quantities of RNA, but note also that sample sizes should increase to compensate for the loss of signal-to-noise when using low quantities of starting RNA.
The BeadArray platform maintains a low false discovery rate even when small amounts of starting RNA are used. In contrast, the sensitivity of the platform drops off noticeably over the same range. Thus, those conducting experiments should not opt for low quantities of starting RNA without consideration of the costs of doing so. The implications for experimental design, and the integration of data from different starting quantities, are complex.
The amplification of millions of single molecules in parallel can be carried out on microscopic magnetic beads contained in aqueous compartments of an oil-buffer emulsion. These bead-emulsion amplification (BEA) reactions result in beads covered by almost identical copies derived from a single template. The post-PCR analysis is carried out using different fluorophore-labeled probes. We have identified BEA reaction conditions that efficiently produce longer amplicons of up to 450 base pairs. These conditions include the use of a Titanium Taq amplification system. Second, we explored alternate fluorophores coupled to probes for post-PCR DNA analysis. We demonstrate that four different Alexa fluorophores can be used simultaneously with extremely low crosstalk. Finally, we developed an allele-specific extension chemistry based on Alexa dyes to query individual nucleotides of the amplified material that is both highly efficient and specific.