|Home | About | Journals | Submit | Contact Us | Français|
Further advances in the prevention, diagnosis and treatment of cancer requires a more complete knowledge of the molecular mechanisms that program the malignant state. Until recently, identifying and validating genetic alterations in tumors that contribute to cancer involved painstaking efforts focused primarily on single mutations. However, the application of whole genome approaches to the study of cancer now makes it possible to contemplate performing systematic characterizations of the structural basis of cancer by identifying mutations associated with each cancer type. In parallel, recent technological advances also make it possible to methodically characterize the function of putative oncogenes and tumor suppressor genes. The integration of these approaches now provides the means to not only derive a complete molecular description of cancer but will also provide well-validated targets for the development of therapeutic agents.
Cancer develops from normal tissues through a stepwise accumulation of genetic mutations. These mutations induce changes in both incipient tumor cells and surrounding cells that program initial neoplastic growth and full malignant transformation. Work in many laboratories over the past four decades has revealed and validated the role of some of the key genetic alterations that are responsible for oncogenesis in particular tissues. Indeed, the identification of oncogenic mutations such as BCR-ABL and activating mutations of EGFR has led to the development of molecularly targeted therapies now employed in the clinic. Despite these advances, it is clear that we continue to lack a full understanding of the mutations that drive cancer development for the majority of human cancer types.
In part, this situation exists because most human tumors, particularly those derived from epithelial cancers, exhibit global genomic alterations that make it difficult to identify mutations critical for cell transformation and to define the consequences of specific cancer-associated mutations. Moreover, since multiple mutations are required to program the transformed phenotype, dissecting how these mutations interact remains a critical step in not only understanding the molecular basis of cancer but also in applying this knowledge to develop effective anti-neoplastic therapeutics.
However, recent advances in sequencing technologies and comprehensive methods to map cancer-associated amplicons and deletions now make it possible to envision enumerating all of the genetic alterations harbored by a particular tumor. For example, advances in DNA sequencing technology have dramatically decreased the cost of sequencing while increasing throughput, and array-based technologies enable the systematic identification of recurrent regions of chromosomal amplification and deletion. Widespread implementation of these and related technologies has already begun to clarify the number and types of genetic alterations that occur in cancer genomes .
However, despite these advances in annotating structural alterations in cancer genomes, identifying the genes targeted by specific mutational, amplification or deletion events and deciphering the function of targeted gene mutations remains a major challenge. Thus, the parallel development of efficient methods to annotate the function of cancer-associated genes is necessary to distill validated cancer targets from this structural description of cancer genomes. The discovery that RNA interference (RNAi) operates in mammalian cells  and the collection and annotation of large collections of human cDNAs [3, 4] now provide a complementary systematic approach to interrogate the function of genes involved in cancer.
In this review we focus on recent advances in the application of genomics to the study of cancer. The integration of these approaches promises both to provide a comprehensive understanding of the molecular events that lead to cancer but will also provide the means to design novel therapeutic strategies.
The RAS proto-oncogene was one of the first human oncogenes discovered . Although changes in gene copy number or protein expression are found in some tumors , the early application of DNA sequencing revealed that a single nucleotide change in the proto-oncogene led to an amino acid substitution at an position that rendered the mutant protein constitutively active . Subsequent studies have shown that a number of proto-oncogenes are converted into oncogenes through the acquisition of somatic mutations .
Based on these observations, several groups have applied capillary-based DNA sequencing technologies to analyze putative oncogenes that had been implicated in signaling pathways important for tumor cell proliferation and survival. For example, complete sequencing of RAS and the known RAS effector BRAF in 530 cancer cell lines and 378 tumor samples confirmed that RAS mutations occur in 15% of melanomas and 35% of colorectal cancers but also identified a high incidence of BRAF mutations (66%) . Similarly, directed sequencing of genes that make up the phosphatidylinositol 3-kinase family (PI3K) of lipid kinases identified activating mutations in PIK3CA in colorectal, breast, brain and gastric cancers .
Directed sequencing of EGFR and kinases in lung cancer also led to the identification of recurrent mutations in EGFR in a significant subset of lung tumors [11-13]. Such tumors were more likely to have been derived from non-smoking women of Asian ancestry. Indeed, prior clinical trials had suggested that this subgroup of patients were more likely to respond to treatment with inhibitors of EGFR. Similarly, sequencing of kinases in neuroblastoma revealed activating mutations in ALK in 8% of primary tumors , and tumors harboring such mutations show increased sensitivity to ALK kinase inhibitors.
Based on these experiences, several groups have initiated efforts to sequence larger panels of genes in a range of cancer types. These efforts have confirmed the frequency of many of the known oncogenic mutations in kinases  and other classes of tumor suppressors and oncogenes [16-20]. In addition, deep sequencing of genes implicated in cancer development in specific cancer types such as glioblastomas has begin to reveal that even if mutations of a particular gene are present in only a fraction of tumors, that mutations in other members of the signaling pathway are often found in other cancers, suggesting that perturbation of certain pathways such as the retinoblastoma pathway are essential for cancer development . Advances in sequencing technologies now make it possible to contemplate the complete sequencing of entire cancer genomes; indeed, the complete sequencing of the genome of tumor and skin cells derived from a patient with M1 acute myeloid leukemia identified 10 mutations, 8 of which were in genes not previously recognized as cancer genes .
However, these initial studies also identified a large number of mutations for which it remains unclear whether such mutations contribute to cancer development either because they occur at low frequency or have yet to be functionally characterized. Thus these initial studies confirm that a full characterization of cancer genomes will require a substantial increase in both the number of genes and tumors analyzed to achieve statistically meaningful results. The development of highly parallel approaches for DNA sequencing [22, 23] will certainly facilitate such studies but access to well-annotated cancer specimens remains a significant challenge.
Clinically, the observation that mutations in specific oncogenes such as EGFR or ALK predict response to targeted agents has led to the planning of clinical studies to test whether such testing will permit the selection of patients who are likely to respond to therapy. At the same time, recent results suggest that the presence of particular mutations, such as oncogenic mutations of KRAS, identify tumors that show little or no response to targeted agents . The recent demonstration that mass spectrometric methods to identify sequence variations can be adapted for use in high throughput identification of known oncogenic mutations  will facilitate the identification of such predictive mutations until the means to perform full genome sequencing becomes possible.
In addition to sequence alterations, it has long been recognized that cancer genomes harbor numerous regions of copy number gain and loss. Indeed, many known oncogenes such as MYC and HER2 exhibit copy number gain more frequently than mutations and tumor suppressor genes are classically defined by loss of heterozygosity. Several whole genome techniques, comparative genome hybridization (CGH), high-density single nucleotide polymorphism (SNP) arrays and representational oligonucleotide microarray analysis (ROMA), permit the detection of chromosomal regions exhibiting decreased or increased copy number at high resolution [26-28]. These technologies have recently been used to identify genes and regions commonly altered in a wide range of cancers such as lung cancer [29, 30], breast cancer  and melanoma . These studies have confirmed that some well defined regions of amplification or deletion target known oncogenes and tumor suppressor genes. In addition, the development of analytical tools together with the analysis of large numbers of samples has led to the identification of new oncogenes such as NKX2.1  and MITF  in lung cancer and melanoma, respectively.
In addition, beyond the identification of specific oncogenes and tumor suppressor genes, several investigators have used these approaches to determine whether the pattern of recurrent alterations predicts the response to treatment. Indeed both whole genome copy number analyses and transcriptional profiles have been used to predict the response of cancer cell lines to specific chemotherapeutic agents [33, 34]. Moreover, recent work has shown that these methods can be used to demonstrate correlation between the presence of recurrent alterations and patient prognosis for breast cancer [35-38].
Although current analytical methods now permit identification of statistically significant regions of recurrent amplification and deletion , many of the regions identified by these approaches harbor dozens of candidate genes. Thus, the application of these methods to larger sample sets will provide a better definition of driver genes within these regions of copy number change. In addition, recent work in which the information obtained using these approaches have been combined with other types of analyses indicate that integrated methods will facilitate the identification and validation of oncogenes and tumor suppressor genes. These approaches will be discussed in Section 4.
In many hematopoietic and pediatric malignancies, detailed analyses of chromosome structure have facilitated the identification of genetic alterations due to the translocation of sequences from one chromosome to another. Work from many laboratories has identified and characterized these chromosomal breakpoints. In some cases, such as in chronic myelogenous leukemia (CML), these translocations results in the creation of a fusion protein (BCR-ABL) that is constitutively active. In many other cancers, these translocations involve transcription factors, such as AML-ETO  and E2A-PBX1  in subsets of leukemia. Although the pathways perturbed by these fusion proteins is known in some cases, identifying the functional consequences of these translocations continues to be an area of active research.
Until recently, due to the bewildering complexity of chromosomal aberrations found in epithelial cancers, most investigators have focused on the analysis of chromosomal translocations found in hematopoietic and pediatric cancers. However, recent work has identified similar types of translocations in lung and prostate cancers. Specifically, Soda et al. identified a cDNA from a lung cancer patient that encoded for an EML4–ALK fusion gene . When expressed in the BaF3 experimental model, this fusion gene, which was found in approximately 6% of Japanese patients with lung cancer, conferred cytokine independence as well as sensitivity to inhibitor with ALK inhibitors.
Although successful, the identification of fusion proteins from expression libraries is technically difficult, in part due to the low levels of expression of such fusion genes. Recent work involving the application of analytical tools to study gene expression profiles derived from tumors has facilitated the identification of translocations in prostate and lung cancer. Specifically, Chinnaiyan and colleagues reasoned that evaluating variance in a data set using the median instead of the mean would maintain the peaks of outliers in expression profiling datasets . Using this analytical method, called cancer outlier profile analysis (COPA), they identified recurrent rearrangements involving the 5′ regulatory sequences of the androgen regulated gene TMPRRS2 and members of the ETS family of transcription factors (ERG or ETV1) . Subsequent work showed that these rearrangements are found in a large fraction of human prostate cancers. Taken together, these observations suggest that other such translocations also occur in other epithelial cancers, and that the application of COPA and paired-end high throughput sequencing will make it possible to discover translocations.
As described in the previous sections, technological advances will enable the construction of comprehensive views of genetic alterations in cancer genomes, and will, in many cases, identify specific genes that drive cancer development or progression. However, since most tumors exhibit hundreds of genetic alterations, deciphering the specific alterations necessary to program malignant transformation remains a significant challenge. In recent years, several laboratories have developed complementary high throughput approaches to manipulate gene expression in experimental models. The basic tenet of these functional genomics studies is that by perturbing the activity of a gene, one can gain insight into its biological functions by assessment in phenotypic assays.
One approach to investigate gene function is to overexpress genes and determine the consequences in specific assays. Although cDNA libraries have been used for many years [45-47], early efforts relied on DNA transfection and thus required strategies that employed many rounds of selection to enrich for the gene of interest. Over the past several years, several groups have created expression libraries in retroviral vectors, which permit higher efficiency gene transduction .
In particular, such retroviral libraries have been used to identify genes that when overexpressed bypass the proliferative arrest induced by various stimuli. For example, this approach has been used to identify TBX2 as a gene amplified in breast cancer that permits cells to proliferate in Bmi-deficient fibroblasts , DRIL1 as a gene that bypasses RAS induced senescence , and BCL6 as a gene that permits proliferation in the presence of active p19ARF/p53 signaling . Using a different library, Huang et al. identified a network of genes that regulate p53 function . In each of these cases, the investigators employed a positive selection strategy. Interestingly, further mechanistic studies link all of these genes to the regulation of the retinoblastoma and p53 pathways, corroborating the several lines of evidence that implicate these pathways as central regulators of proliferation.
More recently, several groups have used similar approaches to identify genes involved in other aspects of cell transformation beyond proliferation. For example, Brugge and her colleagues used a cDNA expression library derived from MCF7 breast cancer cells to identify the prostate derived Ets factor PDEF as a gene that permits immortalized mammary epithelial cells to invade and migrate . This gene, which is overexpressed in breast and prostate cancers, cooperates with receptor tyrosine kinases such as HER2 and CSF-1 to induce cell transformation. Similarly, by expressing an expression library derived from a cancer cell line capable of metastasis in non-metastatic cells, Martin et al. identified BCL-XL as a gene that permits metastatic growth .
In each of these examples, the cDNA libraries used by these investigators were derived from cell lines by reverse transcription of mRNA. Although this approach has been used successfully, two limitations of this methodology are that each gene is not represented at equal frequency in the library and longer cDNAs are under represented. With the development of large collections of open reading frames (ORFs) [3, 4], it is now possible to create expression libraries in which there is equal representation. For example, Boehm et al. used a relatively small cDNA library targeting 353 kinases in an assay to identify IKBKE as a breast cancer oncogene that substitutes for AKT to permit cell transformation .
Another type of gain of function approach utilizes microRNA (miRNA) expression libraries to screen using phenotypic assays. MiRNAs are endogenous small RNAs that function by downregulating expression of their target genes, either through induction of transcript degradation or translational inhibition. Nearly 500 annotated human miRNAs have been described, most of which do not have identified targets or functions . miRNAs implicated in cancer include let-7, a negative regulator of RAS found upregulated in lung cancers , the miR-17-92 cluster, which is upregulated in lymphomas and can promote lymphomagenesis  and miR-15 and miR-16, negative regulators of BCL2, that are downregulated in chronic lymphoctic leukemia . While it is clear that hese miRNAs play a key role in the pathogenesis of these tumors, recent work suggests that miRNAs may play a key role in cancer development as mice lacking Dicer, the endoribonuclease that is required for miRNA processing, show an increased susceptibility to cancer . Indeed, using a retroviral expression library of miRNAs, Voorhoeve et al. identified miR-372 and miR-373 in a Ras-induced senescence bypass screen . In aggregate, these observations provide strong evidence that gain of function approaches, particularly as new ORF and miRNA libraries become available, will continue to permit the identification of genes involved in cancer.
Determining the consequences of gene loss-of-function is a classic means of elucidating gene function. The finding that RNAi operates in mammalian cells now permits researchers to generate loss of function phenotypes in mammalian cells that previously was possible in model organisms. Similar to the cDNA or ORF libraries used for gain-of-function approaches, RNAi libraries can be introduced into cells either stably or transiently. In mammalian cells, RNAi-mediated gene suppression can be induced by the introduction of chemically synthesized siRNAs, or plasmids expressing RNA hairpins, known as shRNAs, which get processed to siRNAs by Dicer . In either case, the siRNA becomes incorporated into the RNA-induced silencing complex (RISC) and directs sequence-specific mediated degradation or translational suppression of the target mRNA, resulting in decreased protein expression . Although siRNAs are easily synthesized and highly effective in inducing gene knockdown, such oligonucleotide reagents are relatively expensive and can only be used for transient loss of function experiments. Vector based systems provide stable expression of the RNAi construct, are renewable resource through propagation in E. coli, and can be used to create retroviruses expand the range and type of cells into which such constructs can be introduced .
Both siRNA and shRNA libraries have been used successfully in transfection-based arrayed screens looking at phenotypes that develop shortly after gene suppression, such as apoptosis, cell signaling events or cell cycle distribution [65-68]. For many other cancer-related phenotypic assays, such as anchorage independent colony formation, bypass of senescence or tumor xenografts, long-term gene suppression is essential, requiring stable integration and expression of the RNAi vector. Recent work from several laboratories has shown that these approaches are tractable in human cells. For example, PITX1 was found as a negative regulator of RAS signaling , REST1 has been identified a negative regulator of PI3K signaling , and CDK8 has been identified as a regulator of β-catenin signaling in colon cancer . Although such arrayed format screens require assays that are amenable to well-based miniaturation, this experimental design permits the use of high content imaging to identify subtle or complex phenotypes [72, 73].
In addition, it is possible to use these vector-based shRNA libraries in pooled formats. The advantages of this approach are that such pooled screens permit the study of a larger number of genes with decreased cost and provide the possibility of using loss of function genetics in assays that cannot be performed in vitro. Several large-scale screens using pooled libraries have been performed [74-77], demonstrating that both positive and negative selection screens are possible using these formats. To facilitate the deconvolution of genes targeted by shRNAs in these screens, each of these groups has developed strategies to quantify the abundance of each shRNA at the beginning and end of each screen by using the sequence of the shRNA or another unique sequence in the shRNA vector.
Taken together, these observations indicate that both gain of function and loss of function approaches provide a complementary path to discovering and validating genes involved in cancer. Although the tools used for these studies continue to evolve, further studies using these methods will help identify both oncogenes and tumor suppressor genes.
The development of technologies that permit the unbiased investigation of structural alterations in cancer genomes as well as those genes essential for cancer phenotypes now provide the mans to generate comprehensive molecular views of cancer. Although each of the approaches described in the prior sections is powerful, the integration of these approaches will provide a much more efficient and effective means to identify and validate novel cancer genes.
Indeed, for several of the genes identified by the gain of function and loss of function approaches described above, a key element that supported a role for the gene in cancer was the finding that the expression of the specific genes was altered in particular types of cancer. For example, PITX1 not only replaced RAS in in vitro transformation assays but was also found to be downregulated in prostate, bladder and colon cancers . Similarly, REST emerged from an RNAi screen for suppressors of epithelial cell transformation and was found to lie within a chromosomal region commonly deleted in colon cancers, suggesting tumor suppressor function for this gene as well . Moreover, the two miRNA identified in the screen to bypass senescence, miRNA-372 and miRNA-373, were also found to be overexpressed in testicular germ cell tumors, with concomitant loss of expression of the putative miRNA target LATS2, supporting a positive role for these miRNAs in cell transformation . Similarly, functional studies focused on Nkx2.1, ALK and MITF provided strong evidence that genes mutated in specific types of cancer contribute directly to the transformed phenotype [14, 30, 32, 42].
In these examples, functional or expression studies were performed to validate initial findings. The increasing availability of large-scale datasets now provides the opportunity to combine these approaches. For example, the breast cancer oncogene IKBKE was identified by combining two functional screens with comprehensive analyses of copy number alterations in primary breast cancer cell lines and tumors. Specifically, IKBKE scored in a gain of function screen for the ability to promote transformation in vitro and a loss of function screen for genes essential for viability of cancer cells, two hallmark characteristics of typical oncogenes , and IKBKE resides within a region commonly amplified in primary breast tumors and breast cancer cell lines. Similarly, the CDK8 oncogene was found by combing two RNAi screens, one to identify genes that regulate β-catenin/TCF transcriptional activity and a second to uncover genes essential for colon cancer cell line proliferation, with a whole genome analysis of copy number alterations in colon cancer . More recently, Zender et al. identified regions recurrently deleted in heptatocellular carcinomas by ROMA and then performed a loss of function screen in an experimental model of murine hepatocellular cancer to identify several putative tumor suppressor genes among the genes harbored by these deleted regions including XPO4, a protein that regulates nuclear export .
Taken together, these experiments provide strong proof of principle evidence that the integration of both structural and functional genomic approaches will accelerate the discovery and validation of new oncogenes and tumor suppressor genes. One clear advantage of these combined approaches is that the limitations of any specific methodology are offset by information derived from a complementary but distinct technology. Moreover, the evidence supporting the role for genes that emerge from these approaches in cancer is certainly stronger.
Although cancer genomes exhibit a staggering complexity, recent advances in genome technologies now provide multiple ways to identify genes involved in cancer initiation or progression. Integrating these different approaches will enable rapid and efficient triangulation of bona fide cancer genes and help validate new targets that show promise for therapeutic targeting.
However, it is also clear that further developments in technology are necessary to truly apply these approaches in a comprehensive manner. It is already clear that new generation sequencing technologies will provide the means to perform more detailed analyses of genetic alterations and may make it possible to sequence entire cancer genomes for specific tumors . Moreover, the prospective implementation of mass spectrometric  and eventually sequencing in patient samples will not only facilitate the discovery of new mutations but will also provide the means to select treatments tailored to the mutations harbored by particular tumors.
In parallel, several groups continue to expand and to improve the coverage and efficiency of reagent libraries that are necessary for gain and loss of function approaches. The use of these tools in both cell and animal studies in a range of assays will provide comprehensive views of genes and pathways involved in specific cancer phenotypes. In addition, initial efforts to provide tools that allow investigators to compare data derived from different sources (e.g. http://www.broad.mit.edu/igv/) now provide the means to perform such integrated analyses. However, the rapid expansion of the types and size of datasets derived from both structural and functional interrogation of cancer genomes will require a new generation of analytical tools to identify genes casually involved in cancer development from the large number of random mutations that occur during cancer development. Indeed, without the further development of statistically rigorous methods, deciphering useful information from these high throughput approaches will not occur.
Moreover, further advances in our understanding of cancer will require both new approaches beyond these described here. For example, new technologies that permit the investigation of epigenetic alterations in cancer genomes  will provide complementary datasets to identify other genes involved in cancer, while new assays that interrogate cancer phenotypes such as invasion and metastasis will likely open new avenues in cancer biology. In addition, a concerted effort dedicated to collection and annotation of patient-derived primary and metastatic tumors is necessary in order to maximize the generality of these approaches. Although significant challenges remain, the pace and scale of progress in applying genomics to the study of cancer should continue to accelerate and provide the means to elucidate a comprehensive view of the key cooperative interactions that drive cancer.
We thank the members of the Hahn laboratory, DFCI Center for Cancer Genome Discovery and the Broad Institute Cancer Program for support and encouragement. This work was supported in part by grants from the NIH (R33 CA128625, U54 CA112962, P01 CA095616), the DoD (W81XWH-07-1-0408), the Prostate Cancer Foundation and the Starr Cancer Consortium.
This work was supported in part by the NIH (R33 CA128625), the Starr Cancer Consortium (I1-A11) and the Prostate Cancer Foundation.
William C. Hahn, M.D., Ph.D.
Associate Professor of Medicine
Harvard Medical School
Department of Medical Oncology, Dana-Farber Cancer Institute
Senior Associate Member, Broad Institute of MIT and Harvard
Dr. William C. Hahn is a medical oncologist and Associate Professor in the Department of Medical Oncology at the Dana-Farber Cancer Institute and a Senior Associate Member of the Broad Institute of MIT and Harvard. He directs the Center for Cancer Genome Discovery at the Dana-Farber Cancer Institute.
Dr. Hahn and his colleagues helped demonstrate that activation of the reverse transcriptase telomerase plays an essential role in malignant transformation. His current work focuses on the understanding the cooperative genetic interactions that lead to malignant transformation and the creation of novel experimental model systems for the study of normal and malignant epithelial biology. In addition, he is a founding member of The RNAi Consortium, Broad Institute-based effort to develop genome scale RNA interference reagents and the technologies for their use. His laboratory has pioneered the use of integrated functional genomic approaches to identify and validate cancer targets. Clinically, he is a member of the Lank Center for Genitourinary Oncology and is devoted to the development of new therapeutic strategies for the treatment of prostate and other cancers.
Dr. Hahn has been the recipient of many honors and awards including a Harvard National Scholarship, a Damon Runyon-Walter Winchell Cancer Research Fund Fellowship, Howard Hughes Medical Institute Pre- and Postdoctoral Fellowships, a Doris Duke Charitable Foundation Clinical Scientist Development Award, the 2000 Wilson S. Stone Award from M.D. Anderson Cancer Center for outstanding research in cancer, a Kimmel Scholar Award, and the Howard Temin Award from the National Cancer Institute. In 2005, Dr. Hahn was elected to the American Society of Clinical Investigation.
Dr. Hahn received his A.B. from Harvard University in Biochemical Sciences summa cum laude in 1987 and his M.D and Ph.D. from Harvard Medical School in 1994. He then completed clinical training in Internal Medicine at the Massachusetts General Hospital and Medical Oncology at the Dana-Farber Cancer Institute. He conducted his postdoctoral studies with Dr. Robert Weinberg at the Whitehead Institute for Biomedical Research and joined the faculty of DFCI and Harvard Medical School in 2001.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.