Purpose of review
The completion of the human genome project has enabled several new technologies for studying cancer genetics and cancer genomes. However, genomic instability and heterogeneity of human tumors impedes a straightforward cataloging of cancer genes and possible therapeutic targets. Strategies enabling the distinction of causal genetic alterations from bystander genomic noise are needed and should significantly speed up the process of cancer-gene discovery.
A series of recent papers described the development of integrative oncogenomic approaches based on innovative cancer mouse models and how these can be used to speed up the discovery of new cancer genes. In the presented studies, spontaneously acquired genetic alterations in mouse tumors of defined genetic origin are used to filter/prioritize relevant lesions from complex human cancer genomes. As will be discussed in this review, a great advantage of this approach is that pinpointed candidate genes can be functionally validated in the right genetic context in vivo, which significantly increases confidence for later therapeutic development efforts.
The discussed approaches hold great promise to speed up the process of cancer-gene discovery and should be considered to complement time-consuming and costly endeavors like the Cancer Genome Project.
array comparative genomic hybridization; cancer mouse models; comparative oncogenomics
Advances in high-throughput, genome-wide profiling technologies have allowed for an unprecedented view of the cancer genome landscape. Specifically, high-density microarrays and sequencing-based strategies have been widely utilized to identify genetic (such as gene dosage, allelic status, and mutations in gene sequence) and epigenetic (such as DNA methylation, histone modification, and micro-RNA) aberrations in cancer. Although the application of these profiling technologies in unidimensional analyses has been instrumental in cancer gene discovery, genes affected by low-frequency events are often overlooked. The integrative approach of analyzing parallel dimensions has enabled the identification of (a) genes that are often disrupted by multiple mechanisms but at low frequencies by any one mechanism and (b) pathways that are often disrupted at multiple components but at low frequencies at individual components. These benefits of using an integrative approach illustrate the concept that the whole is greater than the sum of its parts. As efforts have now turned toward parallel and integrative multidimensional approaches for studying the cancer genome landscape in hopes of obtaining a more insightful understanding of the key genes and pathways driving cancer cells, this review describes key findings disseminating from such high-throughput, integrative analyses, including contributions to our understanding of causative genetic events in cancer cell biology.
Integrative analysis; Cancer genome; Sequencing; Microarray
High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to handle, analyze, and visualize data from these multiple 'omics disciplines have not been adequately developed.
Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can be simultaneously visualized and analyzed with respect to each dimension, allowing combinatorial integration of the different assays belonging to the different 'omics.
The identification of genes altered at multiple levels such as copy number, loss of heterozygosity (LOH), DNA methylation and the detection of consequential changes in gene expression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the high throughput systems biology analysis of cancer.
Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.
Cancer is thought to be caused by a sequence of multiple genetic and epigenetic alterations which occur in one or more of the genes controlling cell cycle progression and signaling transduction. The complexity of carcinogenic mechanisms leads to heterogeneity in molecular phenotype, pathology, and prognosis of cancers.
Genome-wide mutational analysis of cancer genes in individual tumors is the most direct way to elucidate the complex process of disease progression, although such high-throughput sequencing technologies are not yet fully developed. As a surrogate marker for pathway activation analysis, expression profiling using microarrays has been successfully applied for the classification of tumor types, stages of tumor progression, or in some cases, prediction of clinical outcomes. However, the biological implication of those gene expression signatures is often unclear.
Systems biological approaches leverage the signature genes as a representation of changes in signaling pathways, instead of interpreting the relevance between each gene and phenotype. This approach, which can be achieved by comparing the gene set or the expression profile with those of reference experiments in which a defined pathway is modulated, will improve our understanding of cancer classification, clinical outcome, and carcinogenesis. In this review, we will discuss recent studies on the development of expression signatures to monitor signaling pathway activities and how these signatures can be used to improve the identification of responders to anticancer drugs.
Expression signature; signaling pathway; drug discovery; cancer therapy; systems biology.
Cancer is a multifaceted disease that results from dysregulated normal cellular signaling networks caused by genetic, genomic and epigenetic alterations at cell or tissue levels. Uncovering the underlying protein signaling network changes, including cell cycle gene networks in cancer, aids in understanding the molecular mechanism of carcinogenesis and identifies the characteristic signaling network signatures unique for different cancers and specific cancer subtypes. The identified signatures can be used for cancer diagnosis, prognosis, and personalized treatment. During the past several decades, the available technology to study signaling networks has significantly evolved to include such platforms as genomic microarray (expression array, SNP array, CGH array, etc.) and proteomic analysis, which globally assesses genetic, epigenetic, and proteomic alterations in cancer. In this review, we compared Pathway Array analysis with other proteomic approaches in analyzing protein network involved in cancer and its utility serving as cancer biomarkers in diagnosis, prognosis and therapeutic target identification. With the advent of bioinformatics, constructing high complexity signaling networks is possible. As the use of signaling network-based cancer diagnosis, prognosis and treatment is anticipated in the near future, medical and scientific communities should be prepared to apply these techniques to further enhance personalized medicine.
Cancer begins with multiple cumulative epigenetic and genetic alterations that sequencially transform a cell, or a group of cells in a particular organ. The early genetic events might lead to clonal expansion of pre-neoplastic daughter cells in a particular tumor field. Subsequent genomic changes in some of these cells drive them towards the malignant phenotype. These transformed cells are diagnosed histopathologically as cancers owing to changes in cell morphology. Conceivably, a population of daughter cells with early genetic changes (without histopathology) remain in the organ, demonstrating the concept of field cancerization. With present technological advancement, including laser capture microdisection and high-throughput genomic technologies, carefully designed studies using appropriate control tissue will enable identification of important molecular signatures in these genetically transformed but histologically normal cells. Such tumor-specific biomarkers should have excellent clinical utility. This review examines the concept of field cancerization in several cancers and its possible utility in four areas of oncology; risk assessment, early cancer detection, monitoring of tumor progression and definition of tumor margins.
We have developed a transcriptome-wide approach to identify genes affected by promoter CpG island DNA hypermethylation and transcriptional silencing in colorectal cancer. By screening cell lines and validating tumor-specific hypermethylation in a panel of primary human colorectal cancer samples, we estimate that nearly 5% or more of all known genes may be promoter methylated in an individual tumor. When directly compared to gene mutations, we find larger numbers of genes hypermethylated in individual tumors, and a higher frequency of hypermethylation within individual genes harboring either genetic or epigenetic changes. Thus, to enumerate the full spectrum of alterations in the human cancer genome, and to facilitate the most efficacious grouping of tumors to identify cancer biomarkers and tailor therapeutic approaches, both genetic and epigenetic screens should be undertaken.
Loss of gene expression in association with aberrant accumulation of 5-methylcytosine in gene promoter CpG islands is a common feature of human cancer. Here, we describe a method to discover these genes that permits identification of hundreds of novel candidate cancer genes in any cancer cell line. We now estimate that as much as 5% of colon cancer genes may harbor aberrant gene hypermethylation and we term these the cancer “promoter CpG island DNA hypermethylome.” Multiple mutated genes recently identified via cancer resequencing efforts are shown to be within this hypermethylome and to be more likely to undergo epigenetic inactivation than genetic alteration. Our approach allows derivation of new potential tumor biomarkers and potential pathways for therapeutic intervention. Importantly, our findings illustrate that efforts aimed at complete identification of the human cancer genome should include analyses of epigenetic, as well as genetic, changes.
Cancer is characterized by aberrant patterns of expression of multiple genes. These major shifts in gene expression are believed to be due to not only genetic but also epigenetic changes. The epigenetic changes are communicated through chemical modifications, including histone modifications. However, it is unclear whether the binding of histone-modifying proteins to genomic regions and the placing of histone modifications efficiently discriminates corresponding genes from the rest of the genes in the human genome. We performed gene expression analysis of histone demethylases (HDMs) and histone methyltransferases (HMTs), their target genes and genes with relevant histone modifications in normal and tumor tissues. Surprisingly, this analysis revealed the existence of correlations in the expression levels of different HDMs and HMTs. The observed HDM/HMT gene expression signature was specific to particular normal and cancer cell types and highly correlated with target gene expression and the expression of genes with histone modifications. Notably, we observed that trimethylation at lysine 4 and lysine 27 separated preferentially expressed and underexpressed genes, which was strikingly different in cancer cells compared to normal cells. We conclude that changes in coordinated regulation of enzymes executing histone modifications may underlie global epigenetic changes occurring in cancer.
Deregulation of gene expression, a hallmark of cancer, is caused by both genetic and epigenetic mechanisms. The rapid accumulation of epigenome maps of various cancers suggests a new avenue of research, namely integrating epigenomic data with other types of omic data for cancer diagnosis, prognosis, and biomarker discovery. We introduce the MAPIT algorithm (Multi Analyte Pathway Inference Tool), to enable principled integration of epigenomic, transcriptomic, and protein interactome data. As a proof-of-principle, we apply MAPIT to glioblastoma multiforme (GBM), the most common and aggressive form of brain tumor. Few predictive markers were reported for the prognosis of GBM patients. By integrating mRNA transcriptome, promoter DNA methylome and protein-protein physical interactome, we find ten expression- and three methylation-based network markers, involving 118 genes. When tested on additional GBM patient samples, the prognostic accuracy of the multi-analyte network markers (73.5%) is 9.7% and 8.6% higher than previous prognostic signatures built on gene expression or DNA methylation alone. Our results highlight the critical role of two novel pathways in the prognosis of GBM patients, small GTPase-mediated protein trafficking and ubiquitination-dependent protein degradation. A better understanding of these two pathways could lead to personalized therapies for subgroups of GBM patients. Our study demonstrates that integrating epigenomic, transcriptomic, and interactomic data can improve the accuracy network-based prognosis markers and lead to novel mechanistic understanding of cancer.
Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis.
► Genome-wide analyses of mutations emerging through time in 21 breast cancers ► Minimal expansion of subclones occurs until thousands of mutations have accumulated ► Cancer-specific signatures of point mutations and genomic instability emerge late ► ERBB2 amplification begins early but continues to evolve over long molecular time
Newly developed algorithms allow the reconstruction of the genomic history of different breast cancers, tracing the temporal evolution of each tumor and the emergence of the dominant subclones that will eventually trigger diagnosis.
The application of next-generation sequencing technology has produced a transformation in cancer genomics, generating large data sets that can be analyzed in different ways to answer a multitude of questions about the genomic alterations associated with the disease. Analytical approaches can discover focused mutations such as substitutions and small insertion/deletions, large structural alterations and copy number events. As our capacity to produce such data for multiple cancers of the same type is improving, so are the demands to analyze multiple tumor genomes simultaneously growing. For example, pathway-based analyses that provide the full mutational impact on cellular protein networks and correlation analyses aimed at revealing causal relationships between genomic alterations and clinical presentations are both enabled. As the repertoire of data grows to include mRNA-seq, non-coding RNA-seq and methylation for multiple genomes, our challenge will be to intelligently integrate data types and genomes to produce a coherent picture of the genetic basis of cancer.
Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.
Clear cell renal carcinoma (RCC) is the most common and invasive adult renal cancer. For the purpose of identifying RCC biomarkers, we investigated chromosomal regions and individual genes modulated in RCC pathology. We applied the dual strategy of assessing and integrating genomic and transcriptomic data, today considered the most effective approach for understanding genetic mechanisms of cancer and the most sensitive for identifying cancer-related genes.
We performed the first integrated analysis of DNA and RNA profiles of RCC samples using Affymetrix technology. Using 100K SNP mapping arrays, we assembled a genome-wide map of DNA copy number alterations and LOH areas. We thus confirmed the typical genetic signature of RCC but also identified other amplified regions (e.g. on chr. 4, 11, 12), deleted regions (chr. 1, 9, 22) and LOH areas (chr. 1, 2, 9, 13). Simultaneously, using HG-U133 Plus 2.0 arrays, we identified differentially expressed genes (DEGs) in tumor vs. normal samples. Combining genomic and transcriptomic data, we identified 71 DEGs in aberrant chromosomal regions and observed, in amplified regions, a predominance of up-regulated genes (27 of 37 DEGs) and a trend to clustering. Functional annotation of these genes revealed some already implicated in RCC pathology and other cancers, as well as others that may be novel tumor biomarkers.
By combining genomic and transcriptomic profiles from a collection of RCC samples, we identified specific genomic regions with concordant alterations in DNA and RNA profiles and focused on regions with increased DNA copy number. Since the transcriptional modulation of up-regulated genes in amplified regions may be attributed to the genomic alterations characteristic of RCC, these genes may encode novel RCC biomarkers actively involved in tumor initiation and progression and useful in clinical applications.
Cancer is commonly associated with widespread disruption of DNA methylation, chromatin modification and miRNA expression. In this study, we established a robust discovery pipeline to identify epigenetically deregulated miRNAs in cancer.
Using an integrative approach that combines primary transcription, genome-wide DNA methylation and H3K9Ac marks with microRNA (miRNA) expression, we identified miRNA genes that were epigenetically modified in cancer. We find miR-205, miR-21, and miR-196b to be epigenetically repressed, and miR-615 epigenetically activated in prostate cancer cells.
We show that detecting changes in primary miRNA transcription levels is a valuable method for detection of local epigenetic modifications that are associated with changes in mature miRNA expression.
Characterization of the functional components in mammalian genomes depends on our ability to completely elucidate the genetic and epigenetic regulatory networks of chromatin states and nuclear architecture. Such endeavors demand the availability of robust and effective approaches to characterizing protein-DNA associations in their native chromatin environments. Consider able progress has been made through the applica tion of chromatin immunoprecipitation (ChIP) to study chromatin biology in cells. Coupled with genome-wide analyses, ChIP-based assays enable us to take a global, unbiased and comprehensive view of transcriptional control, epigenetic regulation and chromatin structures, with high precision and versatility. The integrated knowledge derived from these studies is used to decipher gene regulatory networks and define genome organization. In this review, we discuss this powerful approach and its current advances. We also explore the possible future developments of ChIP-based approaches to interrogating long-range chromatin interactions and their impact on the mechanisms regulating gene expression.
Cancer cells harbor a large number of molecular alterations such as mutations, amplifications and deletions on DNA sequences and epigenetic changes on DNA methylations. These aberrations may dysregulate gene expressions, which in turn drive the malignancy of tumors. Deciphering the causal and statistical relations of molecular aberrations and gene expressions is critical for understanding the molecular mechanisms of clinical phenotypes.
In this work, we proposed a computational method to reconstruct association modules containing driver aberrations, passenger mRNA or microRNA expressions, and putative regulators that mediate the effects from drivers to passengers. By applying the module-finding algorithm to the integrated datasets of NCI-60 cancer cell lines, we found that gene expressions were driven by diverse molecular aberrations including chromosomal segments' copy number variations, gene mutations and DNA methylations, microRNA expressions, and the expressions of transcription factors. In-silico validation indicated that passenger genes were enriched with the regulator binding motifs, functional categories or pathways where the drivers were involved, and co-citations with the driver/regulator genes. Moreover, 6 of 11 predicted MYB targets were down-regulated in an MYB-siRNA treated leukemia cell line. In addition, microRNA expressions were driven by distinct mechanisms from mRNA expressions.
The results provide rich mechanistic information regarding molecular aberrations and gene expressions in cancer genomes. This kind of integrative analysis will become an important tool for the diagnosis and treatment of cancer in the era of personalized medicine.
Genetic somatic alterations are fundamental hallmarks of cancer. In addition to point and other small mutations targeting cancer genes, solid tumors often exhibit aneuploidy as well as multiple chromosomal rearrangements of large fragments of the genome. Whether somatic chromosomal alterations and aneuploidy are a driving force or a mere consequence of tumorigenesis remains controversial. Recently it became apparent that not only genetic but also epigenetic alterations play a major role in carcinogenesis. Epigenetic regulation mechanisms underlie the maintenance of cell identity crucial for development and differentiation. These epigenetic regulatory mechanisms have been found substantially altered during cancer development and progression. In this review, we discuss approaches designed to analyze genetic and epigenetic alterations in colorectal cancer, especially DNA fingerprinting approaches to detect changes in DNA copy number and methylation. DNA fingerprinting techniques, despite their modest throughput, played a pivotal role in significant discoveries in the molecular basis of colorectal cancer. The aim of this review is to revisit the fingerprinting technologies employed and the oncogenic processes that they unveiled.
Despite the involvement of genetic alterations in neoplastic cell transformation, it is increasingly evident that abnormal epigenetic patterns, such as those affecting DNA methylation and histone posttranslational modifications (PTMs), play an essential role in the early stages of tumor development. This finding, together with the evidence that epigenetic changes are reversible, enabled the development of new antineoplastic therapeutic approaches known as epigenetic therapies. Epigenetic modifications are involved in the control of gene expression, and their aberrant distribution is thought to participate in neoplastic transformation by causing the deregulation of crucial cellular pathways. Epigenetic drugs are able to revert the defective gene expression profile of cancer cells and, consequently, reestablish normal molecular pathways. Considering the emerging interest in epigenetic therapeutics, this review focuses on the approaches affecting DNA methylation, evaluates novel strategies and those already approved for clinical use, and compares their therapeutic potential.
DNA demethylating drugs; 5-azacytidine; 5-aza-2′-deoxycytidine; DNMT inhibitors; epigenetic therapy
Epigenetics is the study of heritable changes in gene expression that occur without a change in DNA sequence. Cancer is a multistep process derived from combinational crosstalk between genetic alterations and epigenetic influences through various environmental factors. The observation that epigenetic changes are reversible makes them an attractive target for cancer prevention. Until recently, there have been difficulties studying epigenetic mechanisms in interactions between dietary factors and environmental toxicants. The development of the field of cancer epigenetics during the past decade has been advanced rapidly by genome-wide technologies – which initially employed microarrays but increasingly are using high-throughput sequencing – which helped to improve the quality of the analysis, increase the capacity of sample throughput, and reduce the cost of assays. It is particularly true for applications of cancer epigenetics in epidemiologic studies that examine the relationship among diet, epigenetics, and cancer because of the issues of tissue heterogeneity, the often limiting amount of DNA samples, and the significant cost of the analyses. This review offers an overview of the state of the science in nutrition, environmental toxicants, epigenetics, and cancer to stimulate further exploration of this important and developing area of science. Additional epidemiologic research is needed to clarify the relationship between these complex epigenetic mechanisms and cancer.
cancer; epigenetics; diet; nutrient; toxicants
Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications.
The DNA in eukaryotes is packaged by histones. Interestingly, histones can be marked by a variety of posttranslational modifications, and it has been hypothesized that distinct combinations of histone modifications mark at distinct functional regions of the genome. The study of histone modifications has been aided by the development of high-throughput techniques to map a wide assortment of histone modifications on a global scale. However, because much of our current understanding of the human genome is concentrated on promoters, most studies have only examined histone modifications at these well-defined sites, ignoring the vast majority of the genome. To aid in the discovery of functional elements outside of these well-annotated loci, we develop an unbiased method that searches for commonly occurring histone modification patterns on a global scale without using any annotation information. This method recovers known patterns associated with transcriptional enhancers and promoters. Supporting the histone code hypothesis, we discover that the different functional activities of enhancers are closely associated with the presence of different histone modification patterns. We also discover several novel patterns that likely contain other potential regulatory elements. As the availability of large-scale histone modification data increases, the ability of methods such as the one presented here to concisely describe commonly occurring chromatin signatures, thereby abstracting away irrelevant or redundant data, will become increasingly more critical.
The explosive development of genomics technologies including microarrays and next generation sequencing (NGS) has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.
cancer genomics; integromics; next generation sequencing; research design
Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.
Cancer has remarkable complexity at the molecular level, with multiple genes, proteins, pathways and regulatory interconnections being affected. We introduce a systems biology approach to study cancer that formally integrates the available genetic, transcriptomic, epigenetic and molecular knowledge on cancer biology and, as a proof of concept, we apply it to colorectal cancer.
We first classified all the genes in the human genome into cancer-associated and non-cancer-associated genes based on extensive literature mining. We then selected a set of functional attributes proven to be highly relevant to cancer biology that includes protein kinases, secreted proteins, transcription factors, post-translational modifications of proteins, DNA methylation and tissue specificity. These cancer-associated genes were used to extract 'common cancer fingerprints' through these molecular attributes, and a Boolean logic was implemented in such a way that both the expression data and functional attributes could be rationally integrated, allowing for the generation of a guilt-by-association algorithm to identify novel cancer-associated genes. Finally, these candidate genes are interlaced with the known cancer-related genes in a network analysis aimed at identifying highly conserved gene interactions that impact cancer outcome. We demonstrate the effectiveness of this approach using colorectal cancer as a test case and identify several novel candidate genes that are classified according to their functional attributes. These genes include the following: 1) secreted proteins as potential biomarkers for the early detection of colorectal cancer (FXYD1, GUCA2B, REG3A); 2) kinases as potential drug candidates to prevent tumor growth (CDC42BPB, EPHB3, TRPM6); and 3) potential oncogenic transcription factors (CDK8, MEF2C, ZIC2).
We argue that this is a holistic approach that faithfully mimics cancer characteristics, efficiently predicts novel cancer-associated genes and has universal applicability to the study and advancement of cancer research.
Gene expression profiling using microarray technologies provides a powerful approach to understand complex biological systems and the pathogenesis of diseases. In the field of liver cancer research, a number of genome-wide profiling studies have been published. These studies have provided gene sets, that is, signature, which could classify tumors and predict clinical outcomes such as survival, recurrence, and metastasis. More recently, the application of genomic profiling has been extended to identify molecular targets, pathways, and the cellular origins of the tumors. Systemic and integrative analyses of multiple data sets and emerging new technologies also accelerate the progress of the cancer genomic studies. Here, we review the genomic signatures identified from the genomic profiling studies of hepatocellular carcinoma (HCC), and categorize and characterize them into prediction, phenotype, function, and molecular target signatures according to their utilities and properties. Our classification of the signatures would be helpful to understand and design studies with extended application of genomic profiles.
signature; microarray; integrative analysis; hepatocellular carcinoma