|Home | About | Journals | Submit | Contact Us | Français|
The genome of epithelial tumors is characterized by numerous chromosomal aberrations, DNA base sequence changes, and epigenetic abnormalities. The epigenome of cancer cells has been most commonly studied at the level of DNA CpG methylation. In squamous cell carcinomas of the lung, CpG methylation patterns undergo substantial changes relative to normal lung epithelium. Using a genome-scale mapping technique for CpG methylation (MIRA-chip), we characterized CpG island methylation and methylation patterns of entire chromosome arms at a level of resolution of ~100 bp. In individual stage I lung carcinomas, several hundred and probably up to a thousand CpG islands become methylated. Interestingly, a large fraction (almost 80%) of the tumor-specifically methylated sequences are targets of the Polycomb complex in embryonic stem cells. Homeobox genes are particularly overrepresented and all four HOX gene loci on chromosomes 2, 7, 12, and 17 are hotspots for tumor-associated methylation because of the presence of multiple methylated CpG islands within these loci. DNA hypomethylation at CpGs in squamous cell tumors preferentially affects repetitive sequence classes including SINEs, LINEs, subtelomeric repeats, and segmental duplications. Since these epigenetic changes are found in early stage tumors, their contribution to tumor etiology should be considered as well as their potential usefulness as diagnostic or prognostic biomarkers of the disease.
The only known enzymatic modification of DNA bases in mammalian cells is the post-replicative addition of a methyl group to position 5 of cytosines. The methylated cytosines are almost exclusively formed at the CpG (5′) dinucleotide sequence. CpG methylation is catalyzed by DNA methyltransferase proteins (DNMTs). DNA methyltransferase 1 (DNMT1) is responsible for faithful copying of the preexisting cellular DNA methylation patterns following DNA replication. DNMT3A and DNMT3B are primarily thought of as de novo DNA methyltransferases responsible for methylation of previously unmethylated CpG sites  although all DNMTs are generally important for the maintenance of methylation patterns . The DNMT2 protein was initially characterized as a DNA methyltransferase  but more recently has been shown to mediate tRNA methylation . Removal of methyl groups from DNA cytosines can be accomplished by a passive ‘dilution’ process involving DNA replication in the absence of DNMT proteins. Alternatively, the methyl group or the entire methylated base may be removed in an active enzymatic pathway. However, the exact nature of the putative mammalian DNA demethylase has remained obscure and controversial . The distribution of CpG sequences along mammalian chromosomes is not uniform. Sequences near transcription start sites, and often including the first exon and intron of a gene, have a much higher frequency of CpG dinucleotides than the rest of the genome. These sequences are called CpG islands . Only about half or less of all CpG islands, however, are associated with protein-coding genes leading to the assumption that CpG islands may have other regulatory roles. CpG islands are thought to remain completely unmethylated in the germ line thus avoiding mutational erosion and CpG loss due to methylation-associated mutagenic mechanisms . Methylation of CpG islands near promoters leads to gene inactivation by several known mechanisms. The binding of certain transcription factors is directly prevented by DNA CpG methylation . Methylated DNA sequences are bound by specialized proteins that have a high affinity for methylated DNA. Examples are MeCP2, MBD1, MBD2. These methyl-CpG binding proteins have the ability to recruit histone deacetylase complexes upon binding to mCpG DNA . The CpG-methylated DNA is often associated with inactive chromatin marks, including deacetylated histones H3 and H4, histone H3 lysine 9 (H3K9) methylation and histone H3 lysine 27 (H3K27) methylation, chromatin configurations, which reinforce the inactive gene expression state.
Changes in DNA methylation patterns are one of the most frequent events that occur in human tumors, and altered CpG methylation patterns discriminate tumor tissue from its nonmalignant counterpart tissue or normal adjacent tissue . Two types of methylation changes are most commonly observed: hypermethylation of CpG islands and a more global hypomethylation of DNA in tumors. The literature now contains thousands of reports that have documented methylation of CpG islands associated with hundreds of different genes, including almost every type of human solid tumor or hematological malignancy. It is unlikely that all of these methylation changes play a causative role in tumorigenesis, and it is a challenge today to pinpoint those crucial genes that are susceptible to methylation-associated gene silencing and are functionally important in preventing tumorigenesis. Tumor-specific methylation may provide a means for detection and early diagnosis of cancer. Identification of methylated CpG islands in easily accessible biological materials such as serum, sputum or urine has the potential to be useful for the early diagnosis of lung cancer and other malignancies [11-13]. If methylation of CpG islands were a critical parameter in tumor maintenance or progression, it would be desirable to reverse DNA hypermethylation. This can be accomplished, at least transiently and in vitro, by treatment of cells with inhibitors of DNA methylation. The prototype of such inhibitors is 5-azacytidine . The potential clinical use of 5-azacytidine and other more recently developed DNA methylation inhibitors as anti-cancer drugs is now being explored by many investigators.
Repetitive DNA elements, such as short and long interspersed nuclear elements (SINEs and LINEs) and other repeat sequences are often hypomethylated in tumors [15-23]. While it seems plausible that methylation-induced silencing of a critical tumor suppressor gene can be an important event in tumorigenesis, the biological significance of tumor-associated DNA hypomethylation is less clear [17,22]. Mouse models of DNA hypomethylation have suggested that loss of methylation can lead to tumor formation. Mice carrying a hypomorphic allele of DNMT1 are susceptible to the development of aggressive T cell lymphomas . One hypothesis that links hypomethylation mechanistically to tumorigenesis is the induction of genomic instability by hypomethylation, either by reactivation of transposable elements or by chromosome rearrangement events directly associated with hypomethylation [25-28].
The mechanisms of CpG island hypermethylation in cancer are mostly unknown. Specific DNA sequences within CpG islands may be associated with the methylation process [29,30]. Whether these sequences are associated with DNA binding proteins in vivo that somehow attract methylation is not known. Others have proposed that gene inactivity imposed by changes in chromatin structure or histone modification predisposes to DNA methylation [31-34]. Specific chromatin configurations may either protect from methylation or may promote DNA methylation at CpG islands. Trimethylation of histone H3 lysine 4 (H3K4me3) is associated with active or potentially active genes and unmethylated CpG islands . This modification interferes with binding of the de novo DNA methyltransferase DNMT3L/DNMT3A complex [36,37]. Trimethylation of histone H3 lysine 27 (H3K27me3), the histone modification mark established by Polycomb repressive complexes , is often associated with repression of developmentally regulated genes. Presence of this mark in stem cells has been associated with DNA methylation of the same sequences in human cancers [39-42] (see also below). Although this connection is mostly based on indirect comparisons between different cell types, the striking coincidence of H3K27me3 and DNA CpG methylation suggests a mechanistic connection that could explain methylation patterns in tumor cells. Our lack of understanding of these mechanisms is at least in part related to a lack of a comprehensive picture of genome-wide chromatin structure and DNA hypermethylation events in tumors, somatic stem cells and tumor progenitor cells.
Lung cancer is the leading cause of cancer death in the United States and most other countries . Its causation by cigarette smoking is unquestionable . Lung cancer accounts for about 30% of all deaths from cancer and at least 1.5 million annual deaths from lung cancer are projected worldwide by 2010. The high (>80%) mortality rate associated with lung cancer is at least in part related to suboptimal therapeutic strategies and the lack of an efficient screening approach for early detection. In comparison, breast, colon or prostate cancers, for which early detection approaches exist, have much higher survival rates. Lung cancers are divided into small cell (SCLC) and non-small cell lung carcinomas (NSCLC) depending on histology and cellular origin. Non-small cell lung cancers are further classified on the basis of histological parameters into three subtypes: squamous cell carcinoma (SCC), adenocarcinoma (ADC) and large cell carcinoma (LCC). Squamous cell carcinomas often affect the central airways while adenocarcinomas arise in the peripheral areas of the lung.
During tumorigenesis, both alleles of a tumor suppressor gene need to be inactivated, for example by chromosomal deletions or loss-of-function mutations in the coding region of a gene. As an alternative mechanism, hypermethylation of CpG islands spanning the promoter regions of tumor suppressor genes (for example, RB, p16, VHL, APC, MLH1, RASSF1A and BRCA1) is a common and important mechanism in carcinogenesis [45-50]. Since hypermethylation generally leads to permanent inactivation of gene expression, this epigenetic alteration is considered to be a key pathway for long-term silencing of tumor suppressor genes.
The importance of CpG island methylation in functional inactivation of lung cancer suppressor genes is becoming increasingly recognized. From initial analysis of a subset of genes, it has been estimated that between 0.5% and 3% of all genes carrying CpG-rich promoter sequences may be silenced by DNA methylation in advanced stage lung cancer [46,51]. We recently reported that several hundred (~200-800) CpG islands are methylated in individual stage I squamous cell carcinoma of the lung . Some of the hypermethylated genes may be bona fide tumor suppressor genes, but in other cases the methylation event may be a consequence of preexisting tissue-specific gene silencing or may somehow be associated with tumor formation rather than being a cause of tumorigenesis. Several specific CpG-island-associated genes are methylated in lung cancer including, for example, p16, RASSF1A, RARbeta, MGMT, GSTP1, CDH13, APC, DAPK, TIMP3, and many others [52-58]. The methylation frequency (i.e., the percentage of tumors analyzed that carry methylated alleles) ranges from only a few percent to more than 80% for these genes. These methylation frequency numbers often differ substantially depending on the study population, tumor histology, and/or methodology used to assess CpG island methylation. In our laboratory, the tumor suppressor gene RASSF1A has been identified and characterized . RASSF1A, which is localized at 3p21.3 in an area of common deletion or heterozygous loss in lung cancer, is inactivated by promoter methylation in about 30-40% of non-small cell lung cancers (37% of squamous cell carcinomas) and in close to 80% of small cell lung cancers [47,59-61].
The field of DNA methylation analysis is moving fast towards genome-wide characterization rather than studying methylation of individual genes in tumors. Diverse technical approaches for large-scale methylation analysis have been developed . The first group of techniques is based on methylation-sensitive restriction endonuclease cleavage of the target sequences (e.g., HpaII, NotI) [63,46,64]. These techniques are useful but are limited by the occurrence of the respective recognition sequences within a CpG island. One other variation of current methylation microarray approaches includes the use of the methylation-dependent restriction enzyme McrBC, which cleaves methylated DNA but not unmethylated DNA [65,66]. Another group of techniques employ bisulfite-induced cytosine to uracil conversion [67,68]. After bisulfite treatment of genomic DNA, the unmethylated cytosines are converted to uracils by deamination, while methylated cytosine residues can hardly react with this agent and remain intact. Bisulfite-based sequencing approaches are not yet cost-effectively applicable to genome-wide DNA methylation screening in mammalian systems but specific chromosomal areas have been sequenced . Genome complexity is effectively reduced to three DNA bases (G, A, U or T) in addition to the rare base 5-methylcytosine after bisulfite conversion of cytosines, which makes it difficult to design specific probes for microarrays or to map unique sequence tags to the genome.
An indirect way to find methylated genes is to use gene expression arrays to identify genes reactivated by treatment with the DNA methylation inhibitor 5-aza-deoxycytidine [70-73]. However, this approach can only be used with cell lines and some CpG-methylated genes may be refractory to 5-aza-dC-induced reactivation.
An antibody against 5-methylcytosine has been used in immunoprecipitation experiments combined with microarrays [74,30] and this method, termed methylated DNA immunoprecipitation or MeDIP, is now becoming used quite commonly. However, this antibody is a resource that is difficult to standardize. The antibody requires single-stranded DNA for recognition, which is often not easily achieved in GC-rich DNA.
We have developed a sensitive methylation detection method, the methylated-CpG island recovery assay (MIRA), which is based on the high affinity of the MBD2/MBD3L1 complex for double-stranded CpG-methylated DNA (Figure 1). This method can be used to analyze the DNA methylation status of large numbers of genes simultaneously using microarray-based approaches or high-throughput sequencing. Methyl-CpG binding domain (MBD) proteins, such as MBD2, have the capacity to bind specifically to methylated DNA sequences but have little affinity to unmethylated DNA. MBD2b, the shorter isoform translated from the MBD2 mRNA, forms a heterodimer with a related protein, MBD3L1, which was identified and characterized in our laboratory [75,76]. MBD3L1 strongly increases the affinity of recombinant MBD2 for methylated DNA [77,78]. In the MIRA procedure, sonicated or restriction-cut genomic DNA isolated from normal or malignant tissue is incubated with GST-tagged MBD2b and His-MBD3L1. The bound methylated DNA is eluted from a glutathione affinity matrix, and used for hybridization to microarrays (MIRA-chip). The MIRA procedure has a high specificity for enriching methylated DNA and unmethylated DNA molecules stay in the supernatant (Rauch et al., 2005). MIRA requires two mCpG sites for efficient pulldown  and the signal depends on the number of methylated CpGs. In collaboration with Dr. Huidong Shi (Univ. of Missouri) and working with a 454 sequencer, we derived ~500,000 sequencing reads of MIRA pulldown fragments derived from a lung cancer cell line. The MIRA-enriched fragments underwent bisulfite conversion before sequencing. As expected, >98% of the sequenced molecules were derived from CpG-rich methylated DNA including CpG islands and repetitive DNA (unpublished data). Thus, the efficiency of the MIRA approach depends on mCpG density and the approach seems to be ideally suited for examining CpG methylation patterns on a genome-wide scale. The MIRA method is several times more sensitive than MeDIP allowing analysis with less DNA (T.A. Rauch and G.P. Pfeifer, unpublished results). MIRA is compatible with several types of microarrays including those manufactured by Agilent, Affymetrix and NimbleGen. The MIRA-enriched DNA can be analyzed by high-throughput massive parallel sequencing as determined with a Roche 454 sequencer (H. Shi, T.A. Rauch, and G.P. Pfeifer, unpublished data) and a Solexa/Illumina G1 sequencer (H. Gao, T.A. Rauch, and G.P. Pfeifer, unpublished data). This variation of the MIRA technique, termed MIRA-seq, has potential to provide complete genome coverage and may supplement or replace the array-based MIRA-chip technique.
Initially, we used spotted CpG island arrays to analyze methylation patterns of the lung cancer cell line A549. Using the data obtained from such arrays, a list of genes was compiled that show hypermethylation in A549 lung cancer cells relative to normal human bronchial epithelial (NHBE) cells . Cancer cell line-specific methylation and lack of methylation in NHBE cells was confirmed by bilsulfite-based analysis for several of the targets identified by the microarrays. Among the 25 targets verified at random in the list of the top 50 of the methylation targets, we have not identified any false positives. This indicates that the false positive discovery rate of MIRA microarrays is low (<4%). For analysis of primary lung cancers, we have been using CpG island microarrays from Agilent Technologies. These arrays contain 237,000 probes covering 27,800 CpG islands. We used Agilent CpG island arrays to characterize the methylated CpG islands in lung tumors in a comprehensive manner [40,23]. In primary stage I lung squamous cell carcinomas, we found that the number of methylated CpG islands ranged from ~200 to more than 800 per each individual tumor . This is likely an underestimation since some sequences may not amplify efficiently due to a long distance between MseI sites and other CpG-rich sequences may not be represented on the arrays. A large number of the genes methylated in lung tumors, in both adenocarcinomas and squamous cell carcinomas, were homeobox genes . We found that all four HOX gene clusters on chromosomes 2, 7, 12, and 17 are preferential targets for DNA methylation in cancer cell lines and also in early stage lung tumors. CpG islands associated with many other homeobox genes, such as SIX, LHX, PAX, DLX and ENGRAILED, were highly methylated as well. Together, more than half (104 of 192) of all CpG island-associated homeobox genes in the lung cancer cell line A549 were methylated. The finding of widespread methylation of homeobox genes in lung cancers lends support to the hypothesis that a substantial fraction of genes methylated in cancer are targets of the Polycomb complex [78,39-42,34]. We analyzed the relationship between Polycomb marking in human ES cells and DNA CpG methylation in lung tumors for five lung squamous cell carcinomas. We considered a gene as a methylation target when 2 of the 5 tumors were methylated. This resulted in a list of 364 methylated CpG islands. For 211 of these genes, data on H3K27me occupancy in human ES cells was available . Of these 211 methylated CpG islands, 167 (= 79%) were Polycomb targets as determined from published data on Polycomb marks in human embryonic stem cells (Figure 2). This is an astonishingly high percentage, and the correlation supports the Polycomb-DNA methylation connection. For the remaining 197 methylated genes, 153 contained the CpG island outside of promoters, and for those genes, Polycomb data are not available since a promoter array was used in the previous study . Thus, it is quite possible that non-promoter CpG islands are also Polycomb targets and are therefore susceptible to methylation.
In a recent study, we have used the MIRA method in combination with CpG island and genomic tiling arrays to characterize at high resolution the DNA methylation changes that occur in the genome of early stage lung squamous cell carcinomas. A number of new tumor-specific CpG island DNA methylation markers were identified as well as a specific defect in methylation of repetitive DNA elements . Interestingly, normal tissues from different individuals showed overall very similar DNA methylation patterns at the level of resolution of these tiling arrays (~100 bp). Each tumor contained several hundred CpG islands that were hypermethylated relative to the corresponding normal lung tissue. We identified and confirmed 12 CpG islands that were methylated in 70 to 100% of the 20 SCC tumors tested and these hold promise as effective biomarkers for early detection of lung cancer . These markers included the CpG islands of the OTX1, BARHL2, MEIS1, OC2, PAX6, IRX2, TFAP2A, and EVX2 genes. These markers were highly specific for tumor-associated methylation, i.e. these CpG islands showed very little CpG methylation in blood DNA and in lung tissue of lung cancer-free individuals.
To analyze tumor-associated DNA methylation changes outside of CpG islands, we used chromosome tiling arrays from NimbleGen (Figure 3). On these tiling arrays, it was apparent that sequences near the centromeres and telomeres of individual chromosomes are more highly methylated than other parts of the chromosome. When zooming in towards much higher resolution, we could identify several hypermethylated CpG islands on each chromosome arm. These methylated islands were entirely consistent with those obtained from the Agilent CpG island arrays. When searching for cancer-associated DNA hypomethylation on chromosomes 7 and 8 and parts of chromosome 6, we found that extensive DNA hypomethylation in lung tumors occurred specifically at repetitive sequences, including SINE, LINE, and LTR elements, segmental duplications, and subtelomeric regions, but single copy sequences rarely became demethylated. The results are consistent with a specific defect in methylation of repetitive DNA sequences in human cancer .
The mechanism of tumor-associated DNA hypomethylation of repetitive DNA remains unknown. One possibility is that repetitive DNA is actively demethylated in cancer cells, perhaps through reactivation of a DNA demethylase gene that normally is expressed only at the zygote stage or in embryonic germ cells. However, the nature of the mammalian DNA demethylase has remained obscure . Alternatively, CpG methylation of repetitive DNA may be lost during rapid DNA replication in cancer cells. Although the maintenance DNA methyltransferase DNMT1 is primarily responsible for copying CpG methylation patterns there is no evidence for a diminished DNMT1 function in cancer tissue. DNMT3A and DNMT3B are de novo DNA methyltransferases, and a related protein, DNMT3L, which is devoid of methyltransferase activity by itself, is capable of stimulating the activity of DNMT3A and DNMT3B . DNMT3L regulates DNA methylation at imprinted sequences and at repeat sequences in mouse germ cells  but the role of these proteins in maintaining methylation of repeat sequences in somatic cells is not clear. Instead of invoking a defect in DNA methyltransferases, another possibility is that the accessibility of DNMTs to repetitive DNA is impeded during tumorigenesis. Further, small-RNA-based mechanisms that guide methylation to repetitive DNA through heterochromatin formation is also a formal possibility that has not been investigated and a defect in this pathway may be associated with cancer.
From recent large scale sequencing of human tumor DNA, it has become clear that recurrent changes in the DNA sequence, such as point mutations, insertions, or deletions within specific genes are quite uncommon in human tumors [82,83]. Most mutations seem to be rather stochastic and are rarely selected in specific genes (exception TP53, RAS genes, a few others). Chromosomal aberrations involving loss of genetic material, e.g. loss of heterozygosity, chromosomal deletions, copy number changes, or translocations are common in tumors, but recurrent clonal chromosomal aberrations are generally rare with the exception of leukemias. However, epigenetic changes are frequent events in all human tumors including lung cancers. These changes affect several hundred genes in individual tumors, and some of them quite frequently and repetitively in a patient population. Future studies will be aimed at determining the causative significance of these epigenetic alterations in tumorigenesis.
The work of the authors has been supported by grants from the National Cancer Institute to G.P.P. (CA084469 and CA128495).
Conflict of interest statement: The authors declare that there are no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.