PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1463907)

Clipboard (0)
None

Related Articles

1.  Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data 
PLoS ONE  2013;8(1):e53014.
Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.
doi:10.1371/journal.pone.0053014
PMCID: PMC3559658  PMID: 23382830
2.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases 
PLoS Computational Biology  2011;7(3):e1001095.
In complex diseases, various combinations of genomic perturbations often lead to the same phenotype. On a molecular level, combinations of genomic perturbations are assumed to dys-regulate the same cellular pathways. Such a pathway-centric perspective is fundamental to understanding the mechanisms of complex diseases and the identification of potential drug targets. In order to provide an integrated perspective on complex disease mechanisms, we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. First, we identified a representative set of genes that are differentially expressed in cancer compared to non-tumor control cases. Assuming that disease-associated gene expression changes are caused by genomic alterations, we determined potential paths from such genomic causes to target genes through a network of molecular interactions. Applying our method to sets of genomic alterations and gene expression profiles of 158 Glioblastoma multiforme (GBM) patients we uncovered candidate causal genes and causal paths that are potentially responsible for the altered expression of disease genes. We discovered a set of putative causal genes that potentially play a role in the disease. Combining an expression Quantitative Trait Loci (eQTL) analysis with pathway information, our approach allowed us not only to identify potential causal genes but also to find intermediate nodes and pathways mediating the information flow between causal and target genes. Our results indicate that different genomic perturbations indeed dys-regulate the same functional pathways, supporting a pathway-centric perspective of cancer. While copy number alterations and gene expression data of glioblastoma patients provided opportunities to test our approach, our method can be applied to any disease system where genetic variations play a fundamental causal role.
Author Summary
It is now being recognized that complex diseases should be studied from the perspective of dys-regulated pathways and processes rather than individual genes. Indeed, various combinations of molecular perturbations might lead to the same disease. In such cases, responses to these perturbations are expected to converge to common pathways. In addition, signals that are associated with each individual perturbation might be weak, rendering studies of complex diseases particularly challenging. Aiming to provide an integrated perspective on complex disease mechanisms we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. Starting with an identification of a disease-associated set of genes and their statistical associations with genomic alterations, we utilized graph-theoretical techniques and combinatorial algorithms to determine potential paths from the genomic causes through a network of molecular interactions. We applied our method to sets of genomic alterations and gene expression profiles of Glioblastoma multiforme (GBM) patients, uncovering candidate causal genes and causal paths that are potentially responsible for the altered expression of disease associated target genes. While copy number alterations and gene expression data of GBM patients provided opportunities to test our approach, our method can be applied to any disease system where genetic alterations play a fundamental causal role, and provides an important step toward the understanding of complex diseases.
doi:10.1371/journal.pcbi.1001095
PMCID: PMC3048384  PMID: 21390271
3.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
doi:10.1038/msb.2011.17
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
4.  Integration of DNA Copy Number Alterations and Transcriptional Expression Analysis in Human Gastric Cancer 
PLoS ONE  2012;7(4):e29824.
Background
Genomic instability with frequent DNA copy number alterations is one of the key hallmarks of carcinogenesis. The chromosomal regions with frequent DNA copy number gain and loss in human gastric cancer are still poorly defined. It remains unknown how the DNA copy number variations contributes to the changes of gene expression profiles, especially on the global level.
Principal Findings
We analyzed DNA copy number alterations in 64 human gastric cancer samples and 8 gastric cancer cell lines using bacterial artificial chromosome (BAC) arrays based comparative genomic hybridization (aCGH). Statistical analysis was applied to correlate previously published gene expression data obtained from cDNA microarrays with corresponding DNA copy number variation data to identify candidate oncogenes and tumor suppressor genes. We found that gastric cancer samples showed recurrent DNA copy number variations, including gains at 5p, 8q, 20p, 20q, and losses at 4q, 9p, 18q, 21q. The most frequent regions of amplification were 20q12 (7/72), 20q12–20q13.1 (12/72), 20q13.1–20q13.2 (11/72) and 20q13.2–20q13.3 (6/72). The most frequent deleted region was 9p21 (8/72). Correlating gene expression array data with aCGH identified 321 candidate oncogenes, which were overexpressed and showed frequent DNA copy number gains; and 12 candidate tumor suppressor genes which were down-regulated and showed frequent DNA copy number losses in human gastric cancers. Three networks of significantly expressed genes in gastric cancer samples were identified by ingenuity pathway analysis.
Conclusions
This study provides insight into DNA copy number variations and their contribution to altered gene expression profiles during human gastric cancer development. It provides novel candidate driver oncogenes or tumor suppressor genes for human gastric cancer, useful pathway maps for the future understanding of the molecular pathogenesis of this malignancy, and the construction of new therapeutic targets.
doi:10.1371/journal.pone.0029824
PMCID: PMC3335165  PMID: 22539939
5.  A Genome-Wide Screen for Promoter Methylation in Lung Cancer Identifies Novel Methylation Markers for Multiple Malignancies  
PLoS Medicine  2006;3(12):e486.
Background
Promoter hypermethylation coupled with loss of heterozygosity at the same locus results in loss of gene function in many tumor cells. The “rules” governing which genes are methylated during the pathogenesis of individual cancers, how specific methylation profiles are initially established, or what determines tumor type-specific methylation are unknown. However, DNA methylation markers that are highly specific and sensitive for common tumors would be useful for the early detection of cancer, and those required for the malignant phenotype would identify pathways important as therapeutic targets.
Methods and Findings
In an effort to identify new cancer-specific methylation markers, we employed a high-throughput global expression profiling approach in lung cancer cells. We identified 132 genes that have 5′ CpG islands, are induced from undetectable levels by 5-aza-2′-deoxycytidine in multiple non-small cell lung cancer cell lines, and are expressed in immortalized human bronchial epithelial cells. As expected, these genes were also expressed in normal lung, but often not in companion primary lung cancers. Methylation analysis of a subset (45/132) of these promoter regions in primary lung cancer (n = 20) and adjacent nonmalignant tissue (n = 20) showed that 31 genes had acquired methylation in the tumors, but did not show methylation in normal lung or peripheral blood cells. We studied the eight most frequently and specifically methylated genes from our lung cancer dataset in breast cancer (n = 37), colon cancer (n = 24), and prostate cancer (n = 24) along with counterpart nonmalignant tissues. We found that seven loci were frequently methylated in both breast and lung cancers, with four showing extensive methylation in all four epithelial tumors.
Conclusions
By using a systematic biological screen we identified multiple genes that are methylated with high penetrance in primary lung, breast, colon, and prostate cancers. The cross-tumor methylation pattern we observed for these novel markers suggests that we have identified a partial promoter hypermethylation signature for these common malignancies. These data suggest that while tumors in different tissues vary substantially with respect to gene expression, there may be commonalities in their promoter methylation profiles that represent targets for early detection screening or therapeutic intervention.
John Minna and colleagues report that a group of genes are commonly methylated in primary lung, breast, colon, and prostate cancer.
Editors' Summary
Background.
Tumors or cancers contain cells that have lost many of the control mechanisms that normally regulate their behavior. Unlike normal cells, which only divide to repair damaged tissues, cancer cells divide uncontrollably. They also gain the ability to move round the body and start metastases in secondary locations. These changes in behavior result from alterations in their genetic material. For example, mutations (permanent changes in the sequence of nucleotides in the cell's DNA) in genes known as oncogenes stimulate cells to divide constantly. Mutations in another group of genes—tumor suppressor genes—disable their ability to restrain cell growth. Key tumor suppressor genes are often completely lost in cancer cells. But not all the genetic changes in cancer cells are mutations. Some are “epigenetic” changes—chemical modifications of genes that affect the amount of protein made from them. In cancer cells, methyl groups are often added to CG-rich regions—this is called hypermethylation. These “CpG islands” lie near gene promoters—sequences that control the transcription of DNA into RNA, the template for protein production—and their methylation switches off the promoter. Methylation of the promoter of one copy of a tumor suppressor gene, which often coincides with the loss of the other copy of the gene, is thought to be involved in cancer development.
Why Was This Study Done?
The rules that govern which genes are hypermethylated during the development of different cancer types are not known, but it would be useful to identify any DNA methylation events that occur regularly in common cancers for two reasons. First, specific DNA methylation markers might be useful for the early detection of cancer. Second, identifying these epigenetic changes might reveal cellular pathways that are changed during cancer development and so identify new therapeutic targets. In this study, the researchers have used a systematic biological screen to identify genes that are methylated in many lung, breast, colon, and prostate cancers—all cancers that form in “epithelial” tissues.
What Did the Researchers Do and Find?
The researchers used microarray expression profiling to examine gene expression patterns in several lung cancer and normal lung cell lines. In this technique, labeled RNA molecules isolated from cells are applied to a “chip” carrying an array of gene fragments. Here, they stick to the fragment that represents the gene from which they were made, which allows the genes that the cells express to be catalogued. By comparing the expression profiles of lung cancer cells and normal lung cells before and after treatment with a chemical that inhibits DNA methylation, the researchers identified genes that were methylated in the cancer cells—that is, genes that were expressed in normal cells but not in cancer cells unless methylation was inhibited. 132 of these genes contained CpG islands. The researchers examined the promoters of 45 of these genes in lung cancer cells taken straight from patients and found that 31 of the promoters were methylated in tumor tissues but not in adjacent normal tissues. Finally, the researchers looked at promoter methylation of the eight genes most frequently and specifically methylated in the lung cancer samples in breast, colon, and prostate cancers. Seven of the genes were frequently methylated in both lung and breast cancers; four were extensively methylated in all the tumor types.
What Do These Findings Mean?
These results identify several new genes that are often methylated in four types of epithelial tumor. The observation that these genes are methylated in multiple independent tumors strongly suggests, but does not prove, that loss of expression of the proteins that they encode helps to convert normal cells into cancer cells. The frequency and diverse patterning of promoter methylation in different tumor types also indicates that methylation is not a random event, although what controls the patterns of methylation is not yet known. The identification of these genes is a step toward building a promoter hypermethylation profile for the early detection of human cancer. Furthermore, although tumors in different tissues vary greatly with respect to gene expression patterns, the similarities seen in this study in promoter methylation profiles might help to identify new therapeutic targets common to several cancer types.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0030486.
US National Cancer Institute, information for patients on understanding cancer
CancerQuest, information provided by Emory University about how cancer develops
Cancer Research UK, information for patients on cancer biology
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence, background information and latest news about epigenetics
doi:10.1371/journal.pmed.0030486
PMCID: PMC1716188  PMID: 17194187
6.  Integrative Genomic Analyses Identify BRF2 as a Novel Lineage-Specific Oncogene in Lung Squamous Cell Carcinoma 
PLoS Medicine  2010;7(7):e1000315.
William Lockwood and colleagues show that the focal amplification of a gene, BRF2, on Chromosome 8p12 plays a key role in squamous cell carcinoma of the lung.
Background
Traditionally, non-small cell lung cancer is treated as a single disease entity in terms of systemic therapy. Emerging evidence suggests the major subtypes—adenocarcinoma (AC) and squamous cell carcinoma (SqCC)—respond differently to therapy. Identification of the molecular differences between these tumor types will have a significant impact in designing novel therapies that can improve the treatment outcome.
Methods and Findings
We used an integrative genomics approach, combing high-resolution comparative genomic hybridization and gene expression microarray profiles, to compare AC and SqCC tumors in order to uncover alterations at the DNA level, with corresponding gene transcription changes, which are selected for during development of lung cancer subtypes. Through the analysis of multiple independent cohorts of clinical tumor samples (>330), normal lung tissues and bronchial epithelial cells obtained by bronchial brushing in smokers without lung cancer, we identified the overexpression of BRF2, a gene on Chromosome 8p12, which is specific for development of SqCC of lung. Genetic activation of BRF2, which encodes a RNA polymerase III (Pol III) transcription initiation factor, was found to be associated with increased expression of small nuclear RNAs (snRNAs) that are involved in processes essential for cell growth, such as RNA splicing. Ectopic expression of BRF2 in human bronchial epithelial cells induced a transformed phenotype and demonstrates downstream oncogenic effects, whereas RNA interference (RNAi)-mediated knockdown suppressed growth and colony formation of SqCC cells overexpressing BRF2, but not AC cells. Frequent activation of BRF2 in >35% preinvasive bronchial carcinoma in situ, as well as in dysplastic lesions, provides evidence that BRF2 expression is an early event in cancer development of this cell lineage.
Conclusions
This is the first study, to our knowledge, to show that the focal amplification of a gene in Chromosome 8p12, plays a key role in squamous cell lineage specificity of the disease. Our data suggest that genetic activation of BRF2 represents a unique mechanism of SqCC lung tumorigenesis through the increase of Pol III-mediated transcription. It can serve as a marker for lung SqCC and may provide a novel target for therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Lung cancer is the commonest cause of cancer-related death. Every year, 1.3 million people die from this disease, which is mainly caused by smoking. Most cases of lung cancer are “non-small cell lung cancers” (NSCLCs). Like all cancers, NSCLC starts when cells begin to divide uncontrollably and to move round the body (metastasize) because of changes (mutations) in their genes. These mutations are often in “oncogenes,” genes that, when activated, encourage cell division. Oncogenes can be activated by mutations that alter the properties of the proteins they encode or by mutations that increase the amount of protein made from them, such as gene amplification (an increase in the number of copies of a gene). If NSCLC is diagnosed before it has spread from the lungs (stage I disease), it can be surgically removed and many patients with stage I NSCLC survive for more than 5 years after their diagnosis. Unfortunately, in more than half of patients, NSCLC has metastasized before it is diagnosed. This stage IV NSCLC can be treated with chemotherapy (toxic chemicals that kill fast-growing cancer cells) but only 2% of patients with stage IV lung cancer are alive 5 years after diagnosis.
Why Was This Study Done?
Traditionally, NSCLC has been regarded as a single disease in terms of treatment. However, emerging evidence suggests that the two major subtypes of NSCLC—adenocarcinoma and squamous cell carcinoma (SqCC)—respond differently to chemotherapy. Adenocarcinoma and SqCC start in different types of lung cell and experts think that for each cell type in the body, specific combinations of mutations interact with the cell type's own unique characteristics to provide the growth and survival advantage needed for cancer development. If this is true, then identifying the molecular differences between adenocarcinoma and SqCC could provide targets for more effective therapies for these major subtypes of NSCLC. Amplification of a chromosome region called 8p12 is very common in NSCLC, which suggests that an oncogene that drives lung cancer development is present in this chromosome region. In this study, the researchers investigate this possibility by looking for an amplified gene in the 8p12 chromosome region that makes increased amounts of protein in lung SqCC but not in lung adenocarcinoma.
What Did the Researchers Do and Find?
The researchers used a technique called comparative genomic hybridization to show that focal regions of Chromosome 8p are amplified in about 40% of lung SqCCs, but that DNA loss in this region is the most common alteration in lung adenocarcinomas. Ten genes in the 8p12 chromosome region were expressed at higher levels in the SqCC samples that they examined than in adenocarcinoma samples, they report, and overexpression of five of these genes correlated with amplification of the 8p12 region in the SqCC samples. Only one of the genes—BRF2—was more highly expressed in squamous carcinoma cells than in normal bronchial epithelial cells (the cell type that lines the tubes that take air into the lungs and from which SqCC develops). Artificially induced expression of BRF2 in bronchial epithelial cells made these normal cells behave like tumor cells, whereas reduction of BRF2 expression in squamous carcinoma cells made them behave more like normal bronchial epithelial cells. Finally, BRF2 was frequently activated in two early stages of squamous cell carcinoma—bronchial carcinoma in situ and dysplastic lesions.
What Do These Findings Mean?
Together, these findings show that the focal amplification of chromosome region 8p12 plays a role in the development of lung SqCC but not in the development of lung adenocarcinoma, the other major subtype of NSCLC. These findings identify BRF2 (which encodes a RNA polymerase III transcription initiation factor, a protein that is required for the synthesis of RNA molecules that help to control cell growth) as a lung SqCC-specific oncogene and uncover a unique mechanism for lung SqCC development. Most importantly, these findings suggest that genetic activation of BRF2 could be used as a marker for lung SqCC, which might facilitate the early detection of this type of NSCLC and that BRF2 might provide a new target for therapy.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000315.
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small cell carcinoma (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
doi:10.1371/journal.pmed.1000315
PMCID: PMC2910599  PMID: 20668658
7.  Identification of Networks of Co-Occurring, Tumor-Related DNA Copy Number Changes Using a Genome-Wide Scoring Approach 
PLoS Computational Biology  2010;6(1):e1000631.
Tumorigenesis is a multi-step process in which normal cells transform into malignant tumors following the accumulation of genetic mutations that enable them to evade the growth control checkpoints that would normally suppress their growth or result in apoptosis. It is therefore important to identify those combinations of mutations that collaborate in cancer development and progression. DNA copy number alterations (CNAs) are one of the ways in which cancer genes are deregulated in tumor cells. We hypothesized that synergistic interactions between cancer genes might be identified by looking for regions of co-occurring gain and/or loss. To this end we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. The resulting regions of high co-occurrence can be investigated for between-region functional interactions. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively, showing that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. These networks are also highly enriched for functional relationships between genes. We further examine sub-networks of these networks, core networks, which contain many known cancer genes. The core network for co-occurring DNA losses we find seems to be independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low-intensity copy number alterations may be an important feature of cancer development or maintenance by affecting gene dosage of a large interconnected network of functionally related genes.
Author Summary
It is generally accepted that a normal cell has to acquire multiple mutations in order to become a malignant tumor cell. Considerable effort has been invested in finding single genes involved in tumor initiation and progression, but relatively little is known about the constellations of cancer genes that effectively collaborate in oncogenesis. In this study we focus on the identification of co-occurring DNA copy number alterations (i.e., gains and losses of pieces of DNA) in a series of tumor samples. We describe an analysis method to identify DNA copy number mutations that specifically occur together by examining every possible pair of positions on the genome. We analyze a dataset of hematopoietic tumor cell lines, in which we define a network of specific DNA copy number mutations. The regions in this network contain several well-studied cancer related genes. Upon further investigation we find that the regions of DNA copy number alteration also contain large networks of functionally related genes that have not previously been linked to cancer formation. This might illuminate a novel role for these recurrent DNA copy number mutations in hematopoietic malignancies.
doi:10.1371/journal.pcbi.1000631
PMCID: PMC2791203  PMID: 20052266
8.  Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome 
PLoS Genetics  2012;8(5):e1002692.
The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.
Author Summary
The human genome contains many loci with high incidence of structural mutations, including insertions and deletions of chromosomal segments. This excessive mutability has accelerated evolution and contributed to human disease but has yet to be explained. Segments of DNA repeated in low-copy numbers (LCRs) have been previously implicated in promoting structural mutability in specific disease-associated loci. Lack of methylation (hypomethylation) of genomic DNA has been previously associated with high structural mutability in gibbons and in human cancer cells, but the association with structural mutability in the human germline has not been explored prior to this study. Our analyses confirm the role of LCRs in promoting structural mutability on the genome scale but also reveal a surprisingly strong association of genomic instability with hypomethylation. Specifically, evolutionary analyses reveal that methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in human sperm, harbor a tenfold higher number of structural mutations than genome-wide average. Moreover, the structural mutations in individuals diagnosed with schizophrenia, bipolar disorder, developmental delay, and autism are significantly more concentrated within hypomethylated regions. Our findings suggest a new connection between methylation of genomic DNA, selective structural mutability, evolution, and human disease.
doi:10.1371/journal.pgen.1002692
PMCID: PMC3355074  PMID: 22615578
9.  Combined use of expression and CGH arrays pinpoints novel candidate genes in Ewing sarcoma family of tumors 
BMC Cancer  2009;9:17.
Background
Ewing sarcoma family of tumors (ESFT), characterized by t(11;22)(q24;q12), is one of the most common tumors of bone in children and young adults. In addition to EWS/FLI1 gene fusion, copy number changes are known to be significant for the underlying neoplastic development of ESFT and for patient outcome. Our genome-wide high-resolution analysis aspired to pinpoint genomic regions of highest interest and possible target genes in these areas.
Methods
Array comparative genomic hybridization (CGH) and expression arrays were used to screen for copy number alterations and expression changes in ESFT patient samples. A total of 31 ESFT samples were analyzed by aCGH and in 16 patients DNA and RNA level data, created by expression arrays, was integrated. Time of the follow-up of these patients was 5–192 months. Clinical outcome was statistically evaluated by Kaplan-Meier/Logrank methods and RT-PCR was applied on 42 patient samples to study the gene of the highest interest.
Results
Copy number changes were detected in 87% of the cases. The most recurrent copy number changes were gains at 1q, 2, 8, and 12, and losses at 9p and 16q. Cumulative event free survival (ESFT) and overall survival (OS) were significantly better (P < 0.05) for primary tumors with three or less copy number changes than for tumors with higher number of copy number aberrations. In three samples copy number imbalances were detected in chromosomes 11 and 22 affecting the FLI1 and EWSR1 loci, suggesting that an unbalanced t(11;22) and subsequent duplication of the derivative chromosome harboring fusion gene is a common event in ESFT. Further, amplifications on chromosomes 20 and 22 seen in one patient sample suggest a novel translocation type between EWSR1 and an unidentified fusion partner at 20q. In total 20 novel ESFT associated putative oncogenes and tumor suppressor genes were found in the integration analysis of array CGH and expression data. Quantitative RT-PCR to study the expression levels of the most interesting gene, HDGF, confirmed that its expression was higher than in control samples. However, no association between HDGF expression and patient survival was observed.
Conclusion
We conclude that array CGH and integration analysis proved to be effective methods to identify chromosome regions and novel target genes involved in the tumorigenesis of ESFT.
doi:10.1186/1471-2407-9-17
PMCID: PMC2633345  PMID: 19144156
10.  Convergence of Mutation and Epigenetic Alterations Identifies Common Genes in Cancer That Predict for Poor Prognosis  
PLoS Medicine  2008;5(5):e114.
Background
The identification and characterization of tumor suppressor genes has enhanced our understanding of the biology of cancer and enabled the development of new diagnostic and therapeutic modalities. Whereas in past decades, a handful of tumor suppressors have been slowly identified using techniques such as linkage analysis, large-scale sequencing of the cancer genome has enabled the rapid identification of a large number of genes that are mutated in cancer. However, determining which of these many genes play key roles in cancer development has proven challenging. Specifically, recent sequencing of human breast and colon cancers has revealed a large number of somatic gene mutations, but virtually all are heterozygous, occur at low frequency, and are tumor-type specific. We hypothesize that key tumor suppressor genes in cancer may be subject to mutation or hypermethylation.
Methods and Findings
Here, we show that combined genetic and epigenetic analysis of these genes reveals many with a higher putative tumor suppressor status than would otherwise be appreciated. At least 36 of the 189 genes newly recognized to be mutated are targets of promoter CpG island hypermethylation, often in both colon and breast cancer cell lines. Analyses of primary tumors show that 18 of these genes are hypermethylated strictly in primary cancers and often with an incidence that is much higher than for the mutations and which is not restricted to a single tumor-type. In the identical breast cancer cell lines in which the mutations were identified, hypermethylation is usually, but not always, mutually exclusive from genetic changes for a given tumor, and there is a high incidence of concomitant loss of expression. Sixteen out of 18 (89%) of these genes map to loci deleted in human cancers. Lastly, and most importantly, the reduced expression of a subset of these genes strongly correlates with poor clinical outcome.
Conclusions
Using an unbiased genome-wide approach, our analysis has enabled the discovery of a number of clinically significant genes targeted by multiple modes of inactivation in breast and colon cancer. Importantly, we demonstrate that a subset of these genes predict strongly for poor clinical outcome. Our data define a set of genes that are targeted by both genetic and epigenetic events, predict for clinical prognosis, and are likely fundamentally important for cancer initiation or progression.
Stephen Baylin and colleagues show that a combined genetic and epigenetic analysis of breast and colon cancers identifies a number of clinically significant genes targeted by multiple modes of inactivation.
Editors' Summary
Background.
Cancer is one of the developed world's biggest killers—over half a million Americans die of cancer each year, for instance. As a result, there is great interest in understanding the genetic and environmental causes of cancer in order to improve cancer prevention, diagnosis, and treatment.
Cancer begins when cells begin to multiply out of control. DNA is the sequence of coded instructions—genes—for how to build and maintain the body. Certain “tumor suppressor” genes, for instance, help to prevent cancer by preventing tumors from developing, but changes that alter the DNA code sequence—mutations—can profoundly affect how a gene works. Modern techniques of genetic analysis have identified genes such as tumor suppressors that, when mutated, are linked to the development of certain cancers.
Why Was This Study Done?
However, in recent years, it has become increasingly apparent that mutations are neither necessary nor sufficient to explain every case of cancer. This has led researchers to look at so-called epigenetic factors, which also alter how a gene works without altering its DNA sequence. An example of this is “methylation,” which prevents a gene from being expressed—deactivates it—by a chemical tag. Methylation of genes is part of the normal functioning of DNA, but abnormal methylation has been linked with cancer, aging, and some rare birth abnormalities.
Previous analysis of DNA from breast and colon cancer cells had revealed 189 “candidate cancer genes”—mutated genes that were linked to the development of breast and colon cancer. However, it was not clear how those mutations gave rise to cancer, and individual mutations were present in only 5% to 15% of specific tumors. The authors of this study wanted to know whether epigenetic factors such as methylation contributed to causing the cancers.
What Did the Researchers Do and Find?
The researchers first identified 56 of the 189 candidate cancer genes as likely tumor suppressors and then determined that 36 of these genes were methylated and deactivated, often in both breast and colon (laboratory-grown) cancer cells. In nearly all cases, the methylated genes were not active but could be reactivated by being demethylated. They further showed that, in normal colon and breast tissue samples, 18 of the 36 genes were unmethylated and functioned normally, but in cells taken from breast and colon cancer tumors they were methylated.
In contrast to the genetic mutations, the 18 genes were frequently methylated across a range of tumor types, and eight genes were methylated in both the breast and colon cancers. The authors found by reviewing the genetics and epigenetics of those 18 genes in breast and colon cancer that they were either mutated, methylated, or both. A literature review showed that at least six of the 18 genes were known to have tumor suppressor properties, and the authors determined that 16 were located in parts of DNA known to be missing from cells taken from a range of cancer tumors.
Finally, the researchers analyzed data on cancer cases to show that methylation of these 18 genes was correlated with reduced function of these genes in tumors and with a greater likelihood that a cancer will be terminal or spread to other parts of the body.
What Do These Findings Mean?
The researchers considered only the 189 candidate cancer genes found in one previous study and not other genes identified elsewhere. They also did not consider the biological effects of the individual mutations found in those genes. Despite this, they have demonstrated that methylation of specific genes is likely to play a role in the development of breast and/or colon cancer cells either together with mutations or independently, most likely by turning off their tumor suppression function.
More broadly, however, the study adds to the evidence that future analysis of the role of genes in cancer should include epigenetic as well as genetic factors. In addition, the authors have also shown that a number of these genes may be useful for predicting clinical outcomes for a range of tumor types.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050114.
A December 2006 PLoS Medicine Perspective article reviews the value of examining methylation as a factor in common cancers and its use for early detection
The Web site of the American Cancer Society has a wealth of information and resources on a variety of cancers, including breast and colon cancer
Breastcancer.org is a nonprofit organization providing information about breast cancer on the Web, including research news
Cancer Research UK provides information on cancer research
The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins publishes background information on the authors' research on methylation, setting out its potential for earlier diagnosis and better treatment of cancer
doi:10.1371/journal.pmed.0050114
PMCID: PMC2429944  PMID: 18507500
11.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer 
Genome Biology  2007;8(10):R215.
High resolution array-CGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer, and provides a genome-wide list of common copy number alterations associated with aberrant expression and poor prognosis.
Background
The characterization of copy number alteration patterns in breast cancer requires high-resolution genome-wide profiling of a large panel of tumor specimens. To date, most genome-wide array comparative genomic hybridization studies have used tumor panels of relatively large tumor size and high Nottingham Prognostic Index (NPI) that are not as representative of breast cancer demographics.
Results
We performed an oligo-array-based high-resolution analysis of copy number alterations in 171 primary breast tumors of relatively small size and low NPI, which was therefore more representative of breast cancer demographics. Hierarchical clustering over the common regions of alteration identified a novel subtype of high-grade estrogen receptor (ER)-negative breast cancer, characterized by a low genomic instability index. We were able to validate the existence of this genomic subtype in one external breast cancer cohort. Using matched array expression data we also identified the genomic regions showing the strongest coordinate expression changes ('hotspots'). We show that several of these hotspots are located in the phosphatome, kinome and chromatinome, and harbor members of the 122-breast cancer CAN-list. Furthermore, we identify frequently amplified hotspots on 8q22.3 (EDD1, WDSOF1), 8q24.11-13 (THRAP6, DCC1, SQLE, SPG8) and 11q14.1 (NDUFC2, ALG8, USP35) associated with significantly worse prognosis. Amplification of any of these regions identified 37 samples with significantly worse overall survival (hazard ratio (HR) = 2.3 (1.3-1.4) p = 0.003) and time to distant metastasis (HR = 2.6 (1.4-5.1) p = 0.004) independently of NPI.
Conclusion
We present strong evidence for the existence of a novel subtype of high-grade ER-negative tumors that is characterized by a low genomic instability index. We also provide a genome-wide list of common copy number alteration regions in breast cancer that show strong coordinate aberrant expression, and further identify novel frequently amplified regions that correlate with poor prognosis. Many of the genes associated with these regions represent likely novel oncogenes or tumor suppressors.
doi:10.1186/gb-2007-8-10-r215
PMCID: PMC2246289  PMID: 17925008
12.  Proteomic Changes Resulting from Gene Copy Number Variations in Cancer Cells 
PLoS Genetics  2010;6(9):e1001090.
Along the transformation process, cells accumulate DNA aberrations, including mutations, translocations, amplifications, and deletions. Despite numerous studies, the overall effects of amplifications and deletions on the end point of gene expression—the level of proteins—is generally unknown. Here we use large-scale and high-resolution proteomics combined with gene copy number analysis to investigate in a global manner to what extent these genomic changes have a proteomic output and therefore the ability to affect cellular transformation. We accurately measure expression levels of 6,735 proteins and directly compare them to the gene copy number. We find that the average effect of these alterations on the protein expression is only a few percent. Nevertheless, by using a novel algorithm, we find the combined impact that many of these regional chromosomal aberrations have at the protein level. We show that proteins encoded by amplified oncogenes are often overexpressed, while adjacent amplified genes, which presumably do not promote growth and survival, are attenuated. Furthermore, regulation of biological processes and molecular complexes is independent of general copy number changes. By connecting the primary genome alteration to their proteomic consequences, this approach helps to interpret the data from large-scale cancer genomics efforts.
Author Summary
In the course of cancer development, cells lose regulation of the cell cycle and quality control of DNA replication. As a result, many genomic alterations accumulate, among them amplifications and deletions of chromosomal regions of varying sizes. Oncogenes that drive transformation often reside in amplified regions, while tumor suppressors are deleted, yet for thousands of genes the effect of altering gene copy number is unknown. Since only genomic alterations that ultimately affect protein levels can have functional importance, a global proteomic approach that directly measures such changes is desirable. Here, we examined output of chromosomal alterations on the proteins in a system-wide manner. We analyzed the global protein expression of cancer cells compared to normal cells using mass-spectrometry–based quantitative proteomics and quantified a large part of the expressed proteome. We compared the protein data to genomic data and matched changes in gene copy number to protein expression level changes for each gene. Overall, gene copy number changes explain only a few percent of observed protein expression changes. Knowledge of when genomic and proteomic changes correlate may help in a better understanding of regulatory mechanisms in tumor development.
doi:10.1371/journal.pgen.1001090
PMCID: PMC2932691  PMID: 20824076
13.  MutComFocal: an integrative approach to identifying recurrent and focal genomic alterations in tumor samples 
BMC Systems Biology  2013;7:25.
Background
Most tumors are the result of accumulated genomic alterations in somatic cells. The emerging spectrum of alterations in tumors is complex and the identification of relevant genes and pathways remains a challenge. Furthermore, key cancer genes are usually found amplified or deleted in chromosomal regions containing many other genes. Point mutations, on the other hand, provide exquisite information about amino acid changes that could be implicated in the oncogenic process. Current large-scale genomic projects provide high throughput genomic data in a large number of well-characterized tumor samples.
Methods
We define a Bayesian approach designed to identify candidate cancer genes by integrating copy number and point mutation information. Our method exploits the concept that small and recurrent alterations in tumors are more informative in the search for cancer genes. Thus, the algorithm (Mutations with Common Focal Alterations, or MutComFocal) seeks focal copy number alterations and recurrent point mutations within high throughput data from large panels of tumor samples.
Results
We apply MutComFocal to Diffuse Large B-cell Lymphoma (DLBCL) data from four different high throughput studies, totaling 78 samples assessed for copy number alterations by single nucleotide polymorphism (SNP) array analysis and 65 samples assayed for protein changing point mutations by whole exome/whole transcriptome sequencing. In addition to recapitulating known alterations, MutComFocal identifies ARID1B, ROBO2 and MRS1 as candidate tumor suppressors and KLHL6, IL31 and LRP1 as putative oncogenes in DLBCL.
Conclusions
We present a Bayesian approach for the identification of candidate cancer genes by integrating data collected in large number of cancer patients, across different studies. When trained on a well-studied dataset, MutComFocal is able to identify most of the reported characterized alterations. The application of MutComFocal to large-scale cancer data provides the opportunity to pinpoint the key functional genomic alterations in tumors.
doi:10.1186/1752-0509-7-25
PMCID: PMC3637169  PMID: 23531283
Tumorigenic mutations; Driver genes; Data integration
14.  Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats 
PLoS Genetics  2014;10(6):e1004418.
Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity.
Author Summary
Here we utilize Nanostring digital assays and show their utility for estimating copy number of 186 multicopy genes and tandem repeats. By analyzing patterns of single nucleotide variation around these variants, we show that copy number variation at the vast majority of tandem repeat variations is not effectively tagged by nearby SNPs, and thus standard genome-wide association studies that focus on SNPs provide little or no information about such variants. By comparing patterns of tandem repeat copy number with variation in local gene expression and DNA methylation, we also identify extensive functional effects on local genome function. This includes an example of a non-coding macrosatellite repeat, expansion of which exerts a repressive effect on a nearby gene accompanied by accumulations of local DNA methylation. Finally, comparison of diverse human populations with a number of primate genomes shows that many of these sequences have undergone extreme changes in copy number during recent human and primate evolution, and show signatures that suggest possible selective effects. Overall, we conclude that multicopy genes and macrosatellites represent a highly variable fraction of the genome with important functional effects that has been systematically ignored by previous studies.
doi:10.1371/journal.pgen.1004418
PMCID: PMC4063668  PMID: 24945355
15.  Comprehensive copy number profiles of breast cancer cell model genomes 
Breast Cancer Research  2006;8(1):R9.
Introduction
Breast cancer is the most commonly diagnosed cancer in women worldwide and consequently has been extensively investigated in terms of histopathology, immunochemistry and familial history. Advances in genome-wide approaches have contributed to molecular classification with respect to genomic changes and their subsequent effects on gene expression. Cell lines have provided a renewable resource that is readily used as model systems for breast cancer cell biology. A thorough characterization of their genomes to identify regions of segmental DNA loss (potential tumor-suppressor-containing loci) and gain (potential oncogenic loci) would greatly facilitate the interpretation of biological data derived from such cells. In this study we characterized the genomes of seven of the most commonly used breast cancer model cell lines at unprecedented resolution using a newly developed whole-genome tiling path genomic DNA array.
Methods
Breast cancer model cell lines MCF-7, BT-474, MDA-MB-231, T47D, SK-BR-3, UACC-893 and ZR-75-30 were investigated for genomic alterations with the submegabase-resolution tiling array (SMRT) array comparative genomic hybridization (CGH) platform. SMRT array CGH provides tiling coverage of the human genome permitting break-point detection at about 80 kilobases resolution. Two novel discrete alterations identified by array CGH were verified by fluorescence in situ hybridization.
Results
Whole-genome tiling path array CGH analysis identified novel high-level alterations and fine-mapped previously reported regions yielding candidate genes. In brief, 75 high-level gains and 48 losses were observed and their respective boundaries were documented. Complex alterations involving multiple levels of change were observed on chromosome arms 1p, 8q, 9p, 11q, 15q, 17q and 20q. Furthermore, alignment of whole-genome profiles enabled simultaneous assessment of copy number status of multiple components of the same biological pathway. Investigation of about 60 loci containing genes associated with the epidermal growth factor family (epidermal growth factor receptor, HER2, HER3 and HER4) revealed that all seven cell lines harbor copy number changes to multiple genes in these pathways.
Conclusion
The intrinsic genetic differences between these cell lines will influence their biologic and pharmacologic response as an experimental model. Knowledge of segmental changes in these genomes deduced from our study will facilitate the interpretation of biological data derived from such cells.
doi:10.1186/bcr1370
PMCID: PMC1413994  PMID: 16417655
16.  Myc Oncogene-Induced Genomic Instability: DNA Palindromes in Bursal Lymphomagenesis 
PLoS Genetics  2008;4(7):e1000132.
Genetic instability plays a key role in the formation of naturally occurring cancer. The formation of long DNA palindromes is a rate-limiting step in gene amplification, a common form of tumor-associated genetic instability. Genome-wide analysis of palindrome formation (GAPF) has detected both extensive palindrome formation and gene amplification, beginning early in tumorigenesis, in an experimental Myc-induced model tumor system in the chicken bursa of Fabricius. We determined that GAPF-detected palindromes are abundant and distributed nonrandomly throughout the genome of bursal lymphoma cells, frequently at preexisting short inverted repeats. By combining GAPF with chromatin immunoprecipitation (ChIP), we found a significant association between occupancy of gene-proximal Myc binding sites and the formation of palindromes. Numbers of palindromic loci correlate with increases in both levels of Myc over-expression and ChIP-detected occupancy of Myc binding sites in bursal cells. However, clonal analysis of chick DF-1 fibroblasts suggests that palindrome formation is a stochastic process occurring in individual cells at a small number of loci relative to much larger numbers of susceptible loci in the cell population and that the induction of palindromes is not involved in Myc-induced acute fibroblast transformation. GAPF-detected palindromes at the highly oncogenic bic/miR-155 locus in all of our preneoplastic and neoplastic bursal samples, but not in DNA from normal and other transformed cell types. This finding indicates very strong selection during bursal lymphomagenesis. Therefore, in addition to providing a platform for gene copy number change, palindromes may alter microRNA genes in a fashion that can contribute to cancer development.
Author Summary
Genetic instability is a key process in the development of naturally occurring cancer. Gene amplification is one important consequence of underlying oncogenic instability. Long DNA palindrome formation is a rate-limiting early step in gene amplification. The development of a new functional genomic tool called genome-wide analysis of palindrome formation (GAPF) led to the recent discovery of the widespread occurrence of such palindromes in both animal tumor models and human tumor cells. Using a Myc oncogene-induced lymphoma model system, this paper describes clustering of tumor-specific palindromes throughout the genome as well as an association of sites of palindrome formation with both preexisting short inverted DNA repeat sequences and occupied Myc DNA-binding sites. We discovered consistent palindrome formation at a cancer-associated microRNA gene called bic/miR-155, beginning at an early stage of tumor development, and without significant further amplification of the locus. Thus, DNA palindromes themselves may directly influence tumorigenesis.
doi:10.1371/journal.pgen.1000132
PMCID: PMC2444050  PMID: 18636108
17.  Integration of transcript expression, copy number and LOH analysis of infiltrating ductal carcinoma of the breast 
BMC Cancer  2010;10:460.
Background
A major challenge in the interpretation of genomic profiling data generated from breast cancer samples is the identification of driver genes as distinct from bystander genes which do not impact tumorigenesis. One way to assess the relative importance of alterations in the transcriptome profile is to combine parallel analyses that assess changes in the copy number alterations (CNAs). This integrated analysis permits the identification of genes with altered expression that map within specific chromosomal regions which demonstrate copy number alterations, providing a mechanistic approach to identify the 'driver genes'.
Methods
We have performed whole genome analysis of CNAs using the Affymetrix 250K Mapping array on 22 infiltrating ductal carcinoma samples (IDCs). Analysis of transcript expression alterations was performed using the Affymetrix U133 Plus2.0 array on 16 IDC samples. Fourteen IDC samples were analyzed using both platforms and the data integrated. We also incorporated data from loss of heterozygosity (LOH) analysis to identify genes showing altered expression in LOH regions.
Results
Common chromosome gains and amplifications were identified at 1q21.3, 6p21.3, 7p11.2-p12.1, 8q21.11 and 8q24.3. A novel amplicon was identified at 5p15.33. Frequent losses were found at 1p36.22, 8q23.3, 11p13, 11q23, and 22q13. Over 130 genes were identified with concurrent increases or decreases in expression that mapped to these regions of copy number alterations. LOH analysis revealed three tumors with whole chromosome or p arm allelic loss of chromosome 17. Genes were identified that mapped to copy neutral LOH regions. LOH with accompanying copy loss was detected on Xp24 and Xp25 and genes mapping to these regions with decreased expression were identified. Gene expression data highlighted the PPARα/RXRα Activation Pathway as down-regulated in the tumor samples.
Conclusion
We have demonstrated the utility of the application of integrated analysis using high resolution CGH and whole genome transcript analysis for detecting driver genes in IDC. The high resolution platform allowed a refined demarcation of CNAs and gene expression profiling provided a mechanism to detect genes directly impacted by the CNA. This is the first report of LOH integrated with gene expression in IDC using a high resolution platform.
doi:10.1186/1471-2407-10-460
PMCID: PMC2939551  PMID: 20799942
18.  Endogenous RNA interference is driven by copy number 
eLife  2014;3:e01581.
A plethora of non-protein coding RNAs are produced throughout eukaryotic genomes, many of which are transcribed antisense to protein-coding genes and could potentially instigate RNA interference (RNAi) responses. Here we have used a synthetic RNAi system to show that gene copy number is a key factor controlling RNAi for transcripts from endogenous loci, since transcripts from multi-copy loci form double stranded RNA more efficiently than transcripts from equivalently expressed single-copy loci. Selectivity towards transcripts from high-copy DNA is therefore an emergent property of a minimal RNAi system. The ability of RNAi to selectively degrade transcripts from high-copy loci would allow suppression of newly emerging transposable elements, but such a surveillance system requires transcription. We show that low-level genome-wide pervasive transcription is sufficient to instigate RNAi, and propose that pervasive transcription is part of a defense mechanism capable of directing a sequence-independent RNAi response against transposable elements amplifying within the genome.
DOI: http://dx.doi.org/10.7554/eLife.01581.001
eLife digest
Genes contain the codes that are needed to make the proteins used by cells. This code is transcribed to make a messenger RNA molecule that is then translated to make a protein. However, other types of RNA called non-coding RNA molecules can disrupt this process by binding to messenger RNA molecules, with matching sequences, before translation begins. This phenomenon, which is known as RNA interference, involves enzymes called Dicer and Argonaute.
Many cells contain large numbers of non-coding RNA molecules—so called because they are not translated to produce proteins—and many of these are capable of starting the process of RNA interference. However, most do not, and the reasons for this are not understood. Now, work by Cruz and Houseley has provided new insight into this phenomenon by showing that it is related to the number of copies of the gene encoding such RNAs in the genome.
Yeast cells normally do not have the genes for RNA interference, but Cruz and Houseley used genetically engineered yeast cells containing Dicer and Argonaute. Although most of the messenger RNA molecules in these cells showed no change, the expression of some genes with high ‘copy numbers’ was reduced. Further experiments that involved adding more and more copies of other genes showed that RNA interference could selectively target messenger RNA molecules produced from genes with an increased copy number—particularly if the copies of the genes were clustered in one location in the genome.
RNA interference is also used to defend against DNA sequences that invade and multiply within a genome, such as viruses and other ‘genetic parasites’. As such, the effect observed by Cruz and Houseley could explain why entire genomes are often continuously copied to RNA at low levels. This activity would allow the monitoring of the genome for the invasion of any genetic parasites that had multiplied to high numbers. Following on from this work, the next challenge will be to understand how gene copy number and location are balanced to achieve a selective RNA interference system.
DOI: http://dx.doi.org/10.7554/eLife.01581.002
doi:10.7554/eLife.01581
PMCID: PMC3918874  PMID: 24520161
RNA interference; non-coding RNA; pervasive transcription; copy number; S. cerevisiae
19.  Estimation of Parent Specific DNA Copy Number in Tumors using High-Density Genotyping Arrays 
PLoS Computational Biology  2011;7(1):e1001060.
Chromosomal gains and losses comprise an important type of genetic change in tumors, and can now be assayed using microarray hybridization-based experiments. Most current statistical models for DNA copy number estimate total copy number, which do not distinguish between the underlying quantities of the two inherited chromosomes. This latter information, sometimes called parent specific copy number, is important for identifying allele-specific amplifications and deletions, for quantifying normal cell contamination, and for giving a more complete molecular portrait of the tumor. We propose a stochastic segmentation model for parent-specific DNA copy number in tumor samples, and give an estimation procedure that is computationally efficient and can be applied to data from the current high density genotyping platforms. The proposed method does not require matched normal samples, and can estimate the unknown genotypes simultaneously with the parent specific copy number. The new method is used to analyze 223 glioblastoma samples from the Cancer Genome Atlas (TCGA) project, giving a more comprehensive summary of the copy number events in these samples. Detailed case studies on these samples reveal the additional insights that can be gained from an allele-specific copy number analysis, such as the quantification of fractional gains and losses, the identification of copy neutral loss of heterozygosity, and the characterization of regions of simultaneous changes of both inherited chromosomes.
Author Summary
Many genetic diseases are related to copy number aberrations of some regions of the genome. As we know, each chromosome normally has two copies. However, under some circumstances, for some regions, either one or both of the chromosomes change. Genotyping microarray data provides the copy number of the two alleles of polymorphic sites along the chromosomes, which make the inference of the copy number aberrations of the chromosome feasible. One difficulty is that genotyping microarray data cannot provide the haplotype of the two copies of a chromosome. In this paper, we model the copy number along the chromosome as a two-dimensional Markov Chain. Using the observed copy number of both alleles of all the sites, we can determine the parent specific copy number along the chromosome as well as infer the haplotypes of the two copies of the inherited chromosomes in regions where there is allelic imbalance. Simulation results show high sensitivity and specificity of the method. Applying this method to glioblastoma samples from the Cancer Genome Atlas data illustrate the insights gained from allele-specific copy number analysis.
doi:10.1371/journal.pcbi.1001060
PMCID: PMC3029233  PMID: 21298078
20.  Integrative Analysis Reveals Relationships of Genetic and Epigenetic Alterations in Osteosarcoma 
PLoS ONE  2012;7(11):e48262.
Background
Osteosarcomas are the most common non-haematological primary malignant tumours of bone, and all conventional osteosarcomas are high-grade tumours showing complex genomic aberrations. We have integrated genome-wide genetic and epigenetic profiles from the EuroBoNeT panel of 19 human osteosarcoma cell lines based on microarray technologies.
Principal Findings
The cell lines showed complex patterns of DNA copy number changes, where genomic copy number gains were significantly associated with gene-rich regions and losses with gene-poor regions. By integrating the datasets, 350 genes were identified as having two types of aberrations (gain/over-expression, hypo-methylation/over-expression, loss/under-expression or hyper-methylation/under-expression) using a recurrence threshold of 6/19 (>30%) cell lines. The genes showed in general alterations in either DNA copy number or DNA methylation, both within individual samples and across the sample panel. These 350 genes are involved in embryonic skeletal system development and morphogenesis, as well as remodelling of extracellular matrix. The aberrations of three selected genes, CXCL5, DLX5 and RUNX2, were validated in five cell lines and five tumour samples using PCR techniques. Several genes were hyper-methylated and under-expressed compared to normal osteoblasts, and expression could be reactivated by demethylation using 5-Aza-2′-deoxycytidine treatment for four genes tested; AKAP12, CXCL5, EFEMP1 and IL11RA. Globally, there was as expected a significant positive association between gain and over-expression, loss and under-expression as well as hyper-methylation and under-expression, but gain was also associated with hyper-methylation and under-expression, suggesting that hyper-methylation may oppose the effects of increased copy number for detrimental genes.
Conclusions
Integrative analysis of genome-wide genetic and epigenetic alterations identified dependencies and relationships between DNA copy number, DNA methylation and mRNA expression in osteosarcomas, contributing to better understanding of osteosarcoma biology.
doi:10.1371/journal.pone.0048262
PMCID: PMC3492335  PMID: 23144859
21.  Gene Copy-Number Polymorphism Caused by Retrotransposition in Humans 
PLoS Genetics  2013;9(1):e1003242.
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance.
Author Summary
Recent studies of human genetic variation have revealed that, in addition to differing at single nucleotide polymorphisms, individuals differ in copy-number at many regions of the genome. These copy-number variants (CNVs) are caused by duplication or deletion events and often affect functional sequences such as genes. Efforts to reveal the functional impact of CNVs have identified many variants increasing the risk of various disorders, and some that are adaptive. However, these studies mostly fail to detect gene duplications caused by retrotransposition, in which an mRNA transcript is reverse-transcribed and reinserted into the genome, yielding a new intron-less gene copy. Here we describe a method leveraging next-generation sequence data to accurately detect gene copy-number variants caused by retrotransposition, or retroCNVs, and apply this method to hundreds of whole-genome sequences from three different human subpopulations. We find that these variants account for a substantial number of gene copy-number differences between individuals, and that gene retrotransposition may often result in both deleterious and beneficial mutations. Indeed, we present evidence that two of these new gene duplications may be adaptive. These results imply that retroCNVs are an especially important class of CNV and should be included in future studies of human copy-number variation.
doi:10.1371/journal.pgen.1003242
PMCID: PMC3554589  PMID: 23359205
22.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics 
PLoS Genetics  2014;10(5):e1004383.
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
Author Summary
Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.
doi:10.1371/journal.pgen.1004383
PMCID: PMC4022491  PMID: 24830394
23.  Regulatory Hotspots in the Malaria Parasite Genome Dictate Transcriptional Variation 
PLoS Biology  2008;6(9):e238.
The determinants of transcriptional regulation in malaria parasites remain elusive. The presence of a well-characterized gene expression cascade shared by different Plasmodium falciparum strains could imply that transcriptional regulation and its natural variation do not contribute significantly to the evolution of parasite drug resistance. To clarify the role of transcriptional variation as a source of stain-specific diversity in the most deadly malaria species and to find genetic loci that dictate variations in gene expression, we examined genome-wide expression level polymorphisms (ELPs) in a genetic cross between phenotypically distinct parasite clones. Significant variation in gene expression is observed through direct co-hybridizations of RNA from different P. falciparum clones. Nearly 18% of genes were regulated by a significant expression quantitative trait locus. The genetic determinants of most of these ELPs resided in hotspots that are physically distant from their targets. The most prominent regulatory locus, influencing 269 transcripts, coincided with a Chromosome 5 amplification event carrying the drug resistance gene, pfmdr1, and 13 other genes. Drug selection pressure in the Dd2 parental clone lineage led not only to a copy number change in the pfmdr1 gene but also to an increased copy number of putative neighboring regulatory factors that, in turn, broadly influence the transcriptional network. Previously unrecognized transcriptional variation, controlled by polymorphic regulatory genes and possibly master regulators within large copy number variants, contributes to sweeping phenotypic evolution in drug-resistant malaria parasites.
Author Summary
Development of the malaria parasite, Plasmodium falciparum, in the blood is driven by a number of different genes expressed at different times and at different levels. Exactly what influences such transcriptional changes remains elusive, particularly in regard to important phenotypes like drug resistance. Using cDNA microarray hybridizations from the progeny of a Plasmodium genetic cross, we mapped gene expression quantitative trait loci (eQTLs) in an experimental population of malaria parasites. Each gene's transcript level was used as a segregating phenotype to identify regions of the Plasmodium genome dictating transcriptional variation. Several regulatory hotspots controlled the majority of gene expression variation, mostly via trans-acting mechanisms. One, influencing the largest number of transcripts, coincided with an amplified region of the genome traditionally associated with multiple drug resistance (MDR). Overall, integration of two functional genomic tools (gene mapping and transcript quantitation) has revealed a system-wide rewiring of the parasite transcription network: pleiotropic phenotypic variation, driven by drug selection on genome structure that may be attributed in large part to adaptive copy number polymorphisms in the parasite. These structural variants alter the expression of genes within the amplicon as well as many genes scattered across the Plasmodium genome.
Heritable levels of transcriptional variation, predominantly regulated by genomic copy number variants via trans mechanisms, are surprisingly abundant in drug-resistant malaria parasites.
doi:10.1371/journal.pbio.0060238
PMCID: PMC2553844  PMID: 18828674
24.  Hard Selective Sweep and Ectopic Gene Conversion in a Gene Cluster Affording Environmental Adaptation 
PLoS Genetics  2013;9(8):e1003707.
Among the rare colonizers of heavy-metal rich toxic soils, Arabidopsis halleri is a compelling model extremophile, physiologically distinct from its sister species A. lyrata, and A. thaliana. Naturally selected metal hypertolerance and extraordinarily high leaf metal accumulation in A. halleri both require Heavy Metal ATPase4 (HMA4) encoding a PIB-type ATPase that pumps Zn2+ and Cd2+ out of specific cell types. Strongly enhanced HMA4 expression results from a combination of gene copy number expansion and cis-regulatory modifications, when compared to A. thaliana. These findings were based on a single accession of A. halleri. Few studies have addressed nucleotide sequence polymorphism at loci known to govern adaptations. We thus sequenced 13 DNA segments across the HMA4 genomic region of multiple A. halleri individuals from diverse habitats. Compared to control loci flanking the three tandem HMA4 gene copies, a gradual depletion of nucleotide sequence diversity and an excess of low-frequency polymorphisms are hallmarks of positive selection in HMA4 promoter regions, culminating at HMA4-3. The accompanying hard selective sweep is segmentally eclipsed as a consequence of recurrent ectopic gene conversion among HMA4 protein-coding sequences, resulting in their concerted evolution. Thus, HMA4 coding sequences exhibit a network-like genealogy and locally enhanced nucleotide sequence diversity within each copy, accompanied by lowered sequence divergence between paralogs in any given individual. Quantitative PCR corroborated that, across A. halleri, three genomic HMA4 copies generate overall 20- to 130-fold higher transcript levels than in A. thaliana. Together, our observations constitute an unexpectedly complex profile of polymorphism resulting from natural selection for increased gene product dosage. We propose that these findings are paradigmatic of a category of multi-copy genes from a broad range of organisms. Our results emphasize that enhanced gene product dosage, in addition to neo- and sub-functionalization, can account for the genomic maintenance of gene duplicates underlying environmental adaptation.
Author Summary
Existing genetic diversity reflects evolutionary history, but it has rarely been possible to probe for footprints of selection at loci known to functionally govern adaptive traits. Both naturally selected metal hypertolerance and extraordinary leaf metal accumulation of the extremophile Arabidopsis halleri require strongly enhanced transcript levels of Heavy Metal ATPase4 (HMA4) encoding a PIB-type ATPase that pumps Zn2+ and Cd2+ out of specific cells. By comparison to the metal-sensitive A. thaliana, highly elevated HMA4 expression results from a combination of gene copy number expansion and cis-regulatory modifications. But how do these findings, which were based on a single accession, relate to species-wide HMA4 sequence diversity in A. halleri? Addressing this question, we detect positive selection in the promoter regions of three tandem A. halleri HMA4 paralogs, which are uniformly cis-activated. The accompanying hard selective sweep, however, is segmentally eclipsed as a consequence of recurrent ectopic gene conversion among HMA4 protein-coding sequences, which undergo concerted evolution. Together, this constitutes an unexpectedly complex profile of polymorphism as a result of natural selection. Our observations can serve as a blueprint for future analyses of duplicated genes that have undergone selection for more of the same gene product.
doi:10.1371/journal.pgen.1003707
PMCID: PMC3749932  PMID: 23990800
25.  An integrative characterization of recurrent molecular aberrations in glioblastoma genomes 
Nucleic Acids Research  2013;41(19):8803-8821.
Glioblastoma multiforme (GBM) is the most common and malignant primary brain tumor in adults. Decades of investigations and the recent effort of the Cancer Genome Atlas (TCGA) project have mapped many molecular alterations in GBM cells. Alterations on DNAs may dysregulate gene expressions and drive malignancy of tumors. It is thus important to uncover causal and statistical dependency between ‘effector’ molecular aberrations and ‘target’ gene expressions in GBMs. A rich collection of prior studies attempted to combine copy number variation (CNV) and mRNA expression data. However, systematic methods to integrate multiple types of cancer genomic data—gene mutations, single nucleotide polymorphisms, CNVs, DNA methylations, mRNA and microRNA expressions and clinical information—are relatively scarce. We proposed an algorithm to build ‘association modules’ linking effector molecular aberrations and target gene expressions and applied the module-finding algorithm to the integrated TCGA GBM data sets. The inferred association modules were validated by six tests using external information and datasets of central nervous system tumors: (i) indication of prognostic effects among patients; (ii) coherence of target gene expressions; (iii) retention of effector–target associations in external data sets; (iv) recurrence of effector molecular aberrations in GBM; (v) functional enrichment of target genes; and (vi) co-citations between effectors and targets. Modules associated with well-known molecular aberrations of GBM—such as chromosome 7 amplifications, chromosome 10 deletions, EGFR and NF1 mutations—passed the majority of the validation tests. Furthermore, several modules associated with less well-reported molecular aberrations—such as chromosome 11 CNVs, CD40, PLXNB1 and GSTM1 methylations, and mir-21 expressions—were also validated by external information. In particular, modules constituting trans-acting effects with chromosome 11 CNVs and cis-acting effects with chromosome 10 CNVs manifested strong negative and positive associations with survival times in brain tumors. By aligning the information of association modules with the established GBM subclasses based on transcription or methylation levels, we found each subclass possessed multiple concurrent molecular aberrations. Furthermore, the joint molecular characteristics derived from 16 association modules had prognostic power not explained away by the strong biomarker of CpG island methylator phenotypes. Functional and survival analyses indicated that immune/inflammatory responses and epithelial-mesenchymal transitions were among the most important determining processes of prognosis. Finally, we demonstrated that certain molecular aberrations uniquely recurred in GBM but were relatively rare in non-GBM glioma cells. These results justify the utility of an integrative analysis on cancer genomes and provide testable characterizations of driver aberration events in GBM.
doi:10.1093/nar/gkt656
PMCID: PMC3799430  PMID: 23907387

Results 1-25 (1463907)