Search tips
Search criteria

Results 1-25 (1964028)

Clipboard (0)

Related Articles

1.  Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma 
BMC Genomics  2010;11:576.
Genomic instability plays an important role in human cancers. We previously characterized genomic instability in esophageal squamous cell carcinomas (ESCC) in terms of loss of heterozygosity (LOH) and copy number (CN) changes in tumors using the Affymetrix GeneChip Human Mapping 500K array in 30 cases from a high-risk region of China. In the current study we focused on copy number neutral (CN = 2) LOH (CNNLOH) and its relation to gene expression in ESCC.
Overall we found that 70% of all LOH observed was CNNLOH. Ninety percent of ESCCs showed CNNLOH (median frequency in cases = 60%) and this was the most common type of LOH in two-thirds of cases. CNNLOH occurred on all 39 autosomal chromosome arms, with highest frequencies on 19p (100%), 5p (96%), 2p (95%), and 20q (95%). In contrast, LOH with CN loss represented 19% of all LOH, occurred in just half of ESCCs (median frequency in cases = 0%), and was most frequent on 3p (56%), 5q (47%), and 21q (41%). LOH with CN gain was 11% of all LOH, occurred in 93% of ESCCs (median frequency in cases = 13%), and was most common on 20p (82%), 8q (74%), and 3q (42%). To examine the effect of genomic instability on gene expression, we evaluated RNA profiles from 17 pairs of matched normal and tumor samples (a subset of the 30 ESCCs) using Affymetrix U133A 2.0 arrays. In CN neutral regions, expression of 168 genes (containing 1976 SNPs) differed significantly in tumors with LOH versus tumors without LOH, including 101 genes that were up-regulated and 67 that were down-regulated.
Our results indicate that CNNLOH has a profound impact on gene expression in ESCC, which in turn may affect tumor development.
PMCID: PMC3091724  PMID: 20955586
2.  Intra-Tumor Genetic Heterogeneity in Wilms Tumor: Clonal Evolution and Clinical Implications 
EBioMedicine  2016;9:120-129.
The evolution of pediatric solid tumors is poorly understood. There is conflicting evidence of intra-tumor genetic homogeneity vs. heterogeneity (ITGH) in a small number of studies in pediatric solid tumors. A number of copy number aberrations (CNA) are proposed as prognostic biomarkers to stratify patients, for example 1q + in Wilms tumor (WT); current clinical trials use only one sample per tumor to profile this genetic biomarker. We multisampled 20 WT cases and assessed genome-wide allele-specific CNA and loss of heterozygosity, and inferred tumor evolution, using Illumina CytoSNP12v2.1 arrays, a custom analysis pipeline, and the MEDICC algorithm. We found remarkable diversity of ITGH and evolutionary trajectories in WT. 1q + is heterogeneous in the majority of tumors with this change, with variable evolutionary timing. We estimate that at least three samples per tumor are needed to detect > 95% of cases with 1q +. In contrast, somatic 11p15 LOH is uniformly an early event in WT development. We find evidence of two separate tumor origins in unilateral disease with divergent histology, and in bilateral WT. We also show subclonal changes related to differential response to chemotherapy. Rational trial design to include biomarkers in risk stratification requires tumor multisampling and reliable delineation of ITGH and tumor evolution.
•There is remarkable diversity of intratumor genetic heterogeneity and evolutionary trajectories.•Gain of 1q is frequently heterogeneous and shows variable evolutionary timing.•11p15 CNNLOH is consistently an early event in Wilms tumorigenesis.•Rational biomarker-based treatment stratification in Wilms tumors requires multisampling.We have shown that Wilms tumor (WT), the commonest pediatric kidney cancer, shows a range of evolutionary trajectories. We also found that gain of the long arm of chromosome 1 (1q +) may occur both early or late in the evolution of WT. 1q + is associated with a poorer outcome and is proposed for use in choosing patients for more intensive treatment, but we show that in order to detect it reliably, future trials require testing for this biomarker in multiple samples per tumor.
PMCID: PMC4972528  PMID: 27333041
Intra-tumor genetic heterogeneity; Pediatric solid tumors; Wilms tumor; Tumor evolution; Tumor multisampling; Molecular biomarkers; Copy number aberrations
3.  Identification of Allelic Imbalance with a Statistical Model for Subtle Genomic Mosaicism 
PLoS Computational Biology  2014;10(8):e1003765.
Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual's germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual's inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail.
Author Summary
Allelic imbalance, or a deviation from the expected 1-to-1 ratio of alleles where both were present in the germline, can result when there has been an acquired deletion or duplication of part of a chromosome and is a hallmark of cancer genomes. Tumor genomic profiling studies often involve analysis of samples that contain aberrant tumor cells mixed with normal cells without these acquired mutations. Methods for detecting chromosomal aberrations that result in allelic imbalance within a heterogeneous sample have previously been proposed that use the dispersion of within-sample allele frequencies measured at germline heterozygous positions. Here we demonstrate that combining this information with a measure for the correlation in these dispersions, due to the imbalance of one of the chromosomes, provides the most powerful approach. Our method allows for sensitive identification of short allelic imbalance events (e.g. 10 Mb) contained in as few as 3% of the cells in a heterogeneous mixture. Applications include profiling tumor genomes following surgical resection where there exists high contamination of normal tissue and identifying aberrations in subclones. Our work provides a framework for further development of methods that use observed data and population genetic theory for inference of allelic imbalance.
PMCID: PMC4148184  PMID: 25166618
4.  TAFFYS: An Integrated Tool for Comprehensive Analysis of Genomic Aberrations in Tumor Samples 
PLoS ONE  2015;10(6):e0129835.
Tumor single nucleotide polymorphism (SNP) array is a common platform for investigating the cancer genomic aberration and the functionally important altered genes. Original SNP array signals are usually corrupted by noise, and need to be de-convoluted into absolute copy number profile by analytical methods. Unfortunately, in contrast with the popularity of tumor Affymetrix SNP array, the methods that are specifically designed for this platform are still limited. The complicated characteristics of noise in signals is one of the difficulties for dissecting tumor Affymetrix SNP array data, as they inevitably blur the distinction between aberrations and create an obstacle for the copy number aberration (CNA) identification.
We propose a tool named TAFFYS for comprehensive analysis of tumor Affymetrix SNP array data. TAFFYS introduce a wavelet-based de-noising approach and copy number-specific signal variance model for suppressing and modelling the noise in signals. Then a hidden Markov model is employed for copy number inference. Finally, by using the absolute copy number profile, statistical significance of each aberration region is calculated in term of different aberration types, including amplification, deletion and loss of heterozygosity (LOH). The result shows that copy number specific-variance model and wavelet de-noising algorithm fits well with the Affymetrix SNP array signals, leading to more accurate estimation for diluted tumor sample (even with only 30% of cancer cells) than other existed methods. Results of examinations also demonstrate a good compatibility and extensibility for different Affymetrix SNP array platforms. Application on the 35 breast tumor samples shows that TAFFYS can automatically dissect the tumor samples and reveal statistically significant aberration regions where cancer-related genes locate.
TAFFYS provide an efficient and convenient tool for identifying the copy number alteration and allelic imbalance and assessing the recurrent aberrations for the tumor Affymetrix SNP array data.
PMCID: PMC4482394  PMID: 26111017
5.  MixHMM: Inferring Copy Number Variation and Allelic Imbalance Using SNP Arrays and Tumor Samples Mixed with Stromal Cells 
PLoS ONE  2010;5(6):e10909.
Genotyping platforms such as single nucleotide polymorphism (SNP) arrays are powerful tools to study genomic aberrations in cancer samples. Allele specific information from SNP arrays provides valuable information for interpreting copy number variation (CNV) and allelic imbalance including loss-of-heterozygosity (LOH) beyond that obtained from the total DNA signal available from array comparative genomic hybridization (aCGH) platforms. Several algorithms based on hidden Markov models (HMMs) have been designed to detect copy number changes and copy-neutral LOH making use of the allele information on SNP arrays. However heterogeneity in clinical samples, due to stromal contamination and somatic alterations, complicates analysis and interpretation of these data.
We have developed MixHMM, a novel hidden Markov model using hidden states based on chromosomal structural aberrations. MixHMM allows CNV detection for copy numbers up to 7 and allows more complete and accurate description of other forms of allelic imbalance, such as increased copy number LOH or imbalanced amplifications. MixHMM also incorporates a novel sample mixing model that allows detection of tumor CNV events in heterogeneous tumor samples, where cancer cells are mixed with a proportion of stromal cells.
We validate MixHMM and demonstrate its advantages with simulated samples, clinical tumor samples and a dilution series of mixed samples. We have shown that the CNVs of cancer cells in a tumor sample contaminated with up to 80% of stromal cells can be detected accurately using Illumina BeadChip and MixHMM.
The MixHMM is available as a Python package provided with some other useful tools at
PMCID: PMC2879364  PMID: 20532221
6.  Inferring Loss-of-Heterozygosity from Unpaired Tumors Using High-Density Oligonucleotide SNP Arrays 
PLoS Computational Biology  2006;2(5):e41.
Loss of heterozygosity (LOH) of chromosomal regions bearing tumor suppressors is a key event in the evolution of epithelial and mesenchymal tumors. Identification of these regions usually relies on genotyping tumor and counterpart normal DNA and noting regions where heterozygous alleles in the normal DNA become homozygous in the tumor. However, paired normal samples for tumors and cell lines are often not available. With the advent of oligonucleotide arrays that simultaneously assay thousands of single-nucleotide polymorphism (SNP) markers, genotyping can now be done at high enough resolution to allow identification of LOH events by the absence of heterozygous loci, without comparison to normal controls. Here we describe a hidden Markov model-based method to identify LOH from unpaired tumor samples, taking into account SNP intermarker distances, SNP-specific heterozygosity rates, and the haplotype structure of the human genome. When we applied the method to data genotyped on 100 K arrays, we correctly identified 99% of SNP markers as either retention or loss. We also correctly identified 81% of the regions of LOH, including 98% of regions greater than 3 megabases. By integrating copy number analysis into the method, we were able to distinguish LOH from allelic imbalance. Application of this method to data from a set of prostate samples without paired normals identified known regions of prevalent LOH. We have developed a method for analyzing high-density oligonucleotide SNP array data to accurately identify of regions of LOH and retention in tumors without the need for paired normal samples.
A key event in the generation of many cancers is loss of heterozygosity (LOH) of chromosomal regions containing tumor suppressor genes, whereby one parent's version of the tumor suppressor is lost. As we develop a better understanding of the molecular mechanisms that generate different cancers, a description of the LOH events underlying these cancers is forming an important part of their classification. Generally, detection of LOH relies on comparison of the tumor's genome to the normal genome of the individual. Unfortunately, for many tumors, including most experimental models of cancer, the normal genome is not available. Therefore, the authors have developed a hidden Markov model-based method that evaluates the probability of LOH at all sites throughout the genome, based on high-resolution genotyping of only the tumor. They were able to achieve high levels of accuracy, specifically by taking into account the haplotype block structure of the genome. Application of this method to a set of 34 prostate cancer samples allowed the authors to identify the locations of the known and suspected tumor suppressor genes that are targeted by LOH.
PMCID: PMC1458964  PMID: 16699594
7.  Loss of Heterozygosity and Copy Number Abnormality in Clear Cell Renal Cell Carcinoma Discovered by High-Density Affymetrix 10K Single Nucleotide Polymorphism Mapping Array1 
Neoplasia (New York, N.Y.)  2008;10(7):634-642.
Genetic aberrations are crucial in renal tumor progression. In this study, we describe loss of heterozygosity (LOH) and DNA-copy number abnormalities in clear cell renal cell carcinoma (cc-RCC) discovered by genome-wide single nucleotide polymorphism (SNP) arrays. Genomic DNA from tumor and normal tissue of 22 human cc-RCCs was analyzed on the Affymetrix GeneChip Human Mapping 10K Array. The array data were validated by quantitative polymerase chain reaction and immunohistochemistry. Reduced DNA copy numbers were detected on chromosomal arm 3p in 91%, on chromosome 9 in 32%, and on chromosomal arm 14q in 36% of the tumors. Gains were detected on chromosomal arm 5q in 45% and on chromosome 7 in 32% of the tumors. Copy number abnormalities were found not only in FHIT and VHL loci, known to be involved in renal carcinogenesis, but also in regions containing putative new tumor suppressor genes or oncogenes. In addition, microdeletions were detected on chromosomes 1 and 6 in genes with unknown impact on renal carcinogenesis. In validation experiments, abnormal protein expression of FOXP1 (on 3p) was found in 90% of tumors (concordance with SNP array data in 85%). As assessed by quantitative polymerase chain reaction, PARK2 and PACRG were down-regulated in 57% and 100%, respectively, and CSF1R was up-regulated in 69% of the cc-RCC cases (concordance with SNP array data in 57%, 33%, and 38%). Genome-wide SNP array analysis not only confirmed previously described large chromosomal aberrations but also detected novel microdeletions in genes potentially involved in tumor genesis of cc-RCC.
PMCID: PMC2435001  PMID: 18592004
8.  Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays 
PLoS Genetics  2007;3(8):e143.
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at
Author Summary
Cancer is a genetic disease caused by genomic mutations that confer an increased ability to proliferate and survive in a specific environment. It is now known that many regions of genomic DNA are deleted or amplified in specific cancer types. These aberrations are believed to occur randomly in the genome. If these aberrations overlap more than would be expected by chance across individual occurrences of the cancer this suggests a selective pressure on this aberration. These conserved aberrations likely represent regions that are important for the development, progression, and survival of a specific cancer type in its environment. We present a method for identifying these conserved aberrations within a class of samples. The applications for this method include accurate high resolution mapping of aberrations characteristic of cancer subtypes as well as other genetic diseases and determination of conserved copy number variations in the population. With the use of high resolution microarray methods we have profiled different tumor types. We have been able to create high resolution profiles of conserved aberrations in specific cancer types. These conserved aberrations are prime targets for cancer therapies and many of these regions have already been used to develop effective cancer therapeutics.
PMCID: PMC1950957  PMID: 17722985
9.  Losing balance: Hardy–Weinberg disequilibrium as a marker for recurrent loss-of-heterozygosity in cancer 
Human Molecular Genetics  2011;20(24):4831-4839.
Identifying regions of loss-of-heterozygosity (LOH) in a tumor sample is a challenging problem. State-of-the-art computational approaches can infer LOH from single-nucleotide polymorphism (SNP) array data, but calling precise boundaries is complicated by normal-cell contamination and markers that are homozygous in the germline and therefore non-informative. More recently, the focus has shifted to pinpointing the loci recurrently affected by LOH events across multiple tumors. Recurrent LOH regions often harbor genes important for tumor suppression. Here, we propose a method that infers LOH rates across an entire sample set on an SNP-by-SNP basis. Our method achieves this by leveraging the straightforward principle that, by definition, LOH depletes heterozygotes, thereby disrupting Hardy–Weinberg equilibrium. We apply a statistical test for such LOH-influenced disruptions, and derive a maximum-likelihood estimator for the LOH rate based on the observed number of heterozygotes. This accounts for LOH in both its hemizygous deletion and copy-neutral forms, and does not make use of matched normal genotypes. Power simulations show high levels of sensitivity for the statistical test, and application to a control normal-tissue data set demonstrates a low false-discovery rate. We apply the method to three large publicly available tumor SNP array data sets, where it is able to localize tumor-suppressor gene targets of the LOH events. Inferred LOH rates are quite concordant across platforms/laboratories and between cell lines and tumors, but in a tumor type-dependent fashion. Finally, we produce rate estimates that are generally higher than previously published, and provide evidence that the latter are likely underestimates.
PMCID: PMC3221535  PMID: 21920941
10.  Genomic profiling of CHEK2*1100delC-mutated breast carcinomas 
BMC Cancer  2015;15:877.
CHEK2*1100delC is a moderate-risk breast cancer susceptibility allele with a high prevalence in the Netherlands. We performed copy number and gene expression profiling to investigate whether CHEK2*1100delC breast cancers harbor characteristic genomic aberrations, as seen for BRCA1 mutated breast cancers.
We performed high-resolution SNP array and gene expression profiling of 120 familial breast carcinomas selected from a larger cohort of 155 familial breast tumors, including BRCA1, BRCA2, and CHEK2 mutant tumors. Gene expression analyses based on a mRNA immune signature was used to identify samples with relative low amounts of tumor infiltrating lymphocytes (TILs), which were previously found to disturb tumor copy number and LOH (loss of heterozygosity) profiling. We specifically compared the genomic and gene expression profiles of CHEK2*1100delC breast cancers (n = 14) with BRCAX (familial non-BRCA1/BRCA2/CHEK2*1100delC mutated) breast cancers (n = 34) of the luminal intrinsic subtypes for which both SNP-array and gene expression data is available.
High amounts of TILs were found in a relatively small number of luminal breast cancers as compared to breast cancers of the basal-like subtype. As expected, these samples mostly have very few copy number aberrations and no detectable regions of LOH. By unsupervised hierarchical clustering of copy number data we observed a great degree of heterogeneity amongst the CHEK2*1100delC breast cancers, comparable to the BRCAX breast cancers. Furthermore, copy number aberrations were mostly seen at low frequencies in both the CHEK2*1100delC and BRCAX group of breast cancers. However, supervised class comparison identified copy number loss of chromosomal arm 1p to be associated with CHEK2*1100delC status.
In conclusion, in contrast to basal-like BRCA1 mutated breast cancers, no apparent specific somatic copy number aberration (CNA) profile for CHEK2*1100delC breast cancers was found. With the possible exception of copy number loss of chromosomal arm 1p in a subset of tumors, which might be involved in CHEK2 tumorigenesis. This difference in CNAs profiles might be explained by the need for BRCA1-deficient tumor cells to acquire survival factors, by for example specific copy number aberrations, to expand. Such factors may not be needed for breast tumors with a defect in a non-essential gene such as CHEK2.
Electronic supplementary material
The online version of this article (doi:10.1186/s12885-015-1880-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4640207  PMID: 26553136
Breast carcinoma; CHEK2; Genomic profiling; Copy number aberration; Gene expression
11.  Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer 
Breast Cancer Research : BCR  2011;13(6):R129.
Human epidermal growth factor receptor 2 (HER2)-amplified breast cancer represents a clinically well-defined subgroup due to availability of targeted treatment. However, HER2-amplified tumors have been shown to be heterogeneous at the genomic level by genome-wide microarray analyses, pointing towards a need of further investigations for identification of recurrent copy number alterations and delineation of patterns of allelic imbalance.
High-density whole genome array-based comparative genomic hybridization (aCGH) and single nucleotide polymorphism (SNP) array data from 260 HER2-amplified breast tumors or cell lines, and 346 HER2-negative breast cancers with molecular subtype information were assembled from different repositories. Copy number alteration (CNA), loss-of-heterozygosity (LOH), copy number neutral allelic imbalance (CNN-AI), subclonal CNA and patterns of tumor DNA ploidy were analyzed using bioinformatical methods such as genomic identification of significant targets in cancer (GISTIC) and genome alteration print (GAP). The patterns of tumor ploidy were confirmed in 338 unrelated breast cancers analyzed by DNA flow cytometry with concurrent BAC aCGH and gene expression data.
A core set of 36 genomic regions commonly affected by copy number gain or loss was identified by integrating results with a previous study, together comprising > 400 HER2-amplified tumors. While CNN-AI frequency appeared evenly distributed over chromosomes in HER2-amplified tumors, not targeting specific regions and often < 20% in frequency, the occurrence of LOH was strongly associated with regions of copy number loss. HER2-amplified and HER2-negative tumors stratified by molecular subtypes displayed different patterns of LOH and CNN-AI, with basal-like tumors showing highest frequencies followed by HER2-amplified and luminal B cases. Tumor aneuploidy was strongly associated with increasing levels of LOH, CNN-AI, CNAs and occurrence of subclonal copy number events, irrespective of subtype. Finally, SNP data from individual tumors indicated that genomic amplification in general appears as monoallelic, that is, it preferentially targets one parental chromosome in HER2-amplified tumors.
We have delineated the genomic landscape of CNAs, amplifications, LOH, and CNN-AI in HER2-amplified breast cancer, but also demonstrated a strong association between different types of genomic aberrations and tumor aneuploidy irrespective of molecular subtype.
PMCID: PMC3326571  PMID: 22169037
12.  Genome-Wide Identification of Somatic Aberrations from Paired Normal-Tumor Samples 
PLoS ONE  2014;9(1):e87212.
Genomic copy number alteration and allelic imbalance are distinct features of cancer cells, and recent advances in the genotyping technology have greatly boosted the research in the cancer genome. However, the complicated nature of tumor usually hampers the dissection of the SNP arrays. In this study, we describe a bioinformatic tool, named GIANT, for genome-wide identification of somatic aberrations from paired normal-tumor samples measured with SNP arrays. By efficiently incorporating genotype information of matched normal sample, it accurately detects different types of aberrations in cancer genome, even for aneuploid tumor samples with severe normal cell contamination. Furthermore, it allows for discovery of recurrent aberrations with critical biological properties in tumorigenesis by using statistical significance test. We demonstrate the superior performance of the proposed method on various datasets including tumor replicate pairs, simulated SNP arrays and dilution series of normal-cancer cell lines. Results show that GIANT has the potential to detect the genomic aberration even when the cancer cell proportion is as low as 5∼10%. Application on a large number of paired tumor samples delivers a genome-wide profile of the statistical significance of the various aberrations, including amplification, deletion and LOH. We believe that GIANT represents a powerful bioinformatic tool for interpreting the complex genomic aberration, and thus assisting both academic study and the clinical treatment of cancer.
PMCID: PMC3907544  PMID: 24498045
13.  Genomic Differences Between Estrogen Receptor (ER)-Positive and ER-Negative Human Breast Carcinoma Identified by Single Nucleotide Polymorphism Array Comparative Genome Hybridization Analysis 
Cancer  2010;117(10):2024-2034.
Estrogen receptor (ER) remains one of the most important biomarkers for breast cancer subtyping and prognosis, and comparative genome hybridization has greatly contributed to the understanding of global genetic imbalance. The authors used single-nucleotide polymorphism (SNP) arrays to compare overall copy number aberrations (CNAs) as well as loss of heterozygosity (LOH) of the entire human genome in ER-positive and ER-negative breast carcinomas.
DNA was extracted from frozen tumor sections of 21 breast carcinoma specimens and analyzed with a proprietary 50K XbaI SNP array. Copy number and LOH probability values were derived for each sample. Data were analyzed using bioinformatics and computational software, and permutation tests were used to estimate the significance of these values.
There was a global increase in CNAs and LOH in ER-negative relative to ER-positive cancers. Gain of the long arm of chromosome 1 (1q) and 8q were the most obvious changes common in both subtypes: An increase in the chromosome 1 short arm (1p)/1q ratio was observed in ER-negative samples, and an increased 16p/16q ratio was observed in ER-positive samples. Significant CNAs (adjusted P<.05) in ER-negative relative to ER-positive tumors included 5q deletion, loss of 15q, and gain of 2p and 21q. Copy-neutral LOH (cnLOH) common to both ER-positive and ER-negative samples included 9p21, the p16 tumor suppressor locus, and 4q13, the RCHY1 (ring finger and CHY zinc finger domain-containing 1) oncogene locus. Of particular interest was an enrichment of 17q LOH among the ER-negative tumors, potentially suggesting breast cancer 1 gene (BRCA1) mutations.
SNP array detected both genetic imbalances and cnLOH and was capable of discriminating ER-negative breast cancer from ER-positive breast cancer.
PMCID: PMC4521590  PMID: 21523713
breast cancer; estrogen receptor-negative; copy number; loss of heterozygosity; acquired uniparental disomy; single nucleotide polymorphism comparative genome hybridization
14.  GPHMM: an integrated hidden Markov model for identification of copy number alteration and loss of heterozygosity in complex tumor samples using whole genome SNP arrays 
Nucleic Acids Research  2011;39(12):4928-4941.
There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.
PMCID: PMC3130254  PMID: 21398628
15.  Allele-Specific Amplification in Cancer Revealed by SNP Array Analysis 
PLoS Computational Biology  2005;1(6):e65.
Amplification, deletion, and loss of heterozygosity of genomic DNA are hallmarks of cancer. In recent years a variety of studies have emerged measuring total chromosomal copy number at increasingly high resolution. Similarly, loss-of-heterozygosity events have been finely mapped using high-throughput genotyping technologies. We have developed a probe-level allele-specific quantitation procedure that extracts both copy number and allelotype information from single nucleotide polymorphism (SNP) array data to arrive at allele-specific copy number across the genome. Our approach applies an expectation-maximization algorithm to a model derived from a novel classification of SNP array probes. This method is the first to our knowledge that is able to (a) determine the generalized genotype of aberrant samples at each SNP site (e.g., CCCCT at an amplified site), and (b) infer the copy number of each parental chromosome across the genome. With this method, we are able to determine not just where amplifications and deletions occur, but also the haplotype of the region being amplified or deleted. The merit of our model and general approach is demonstrated by very precise genotyping of normal samples, and our allele-specific copy number inferences are validated using PCR experiments. Applying our method to a collection of lung cancer samples, we are able to conclude that amplification is essentially monoallelic, as would be expected under the mechanisms currently believed responsible for gene amplification. This suggests that a specific parental chromosome may be targeted for amplification, whether because of germ line or somatic variation. An R software package containing the methods described in this paper is freely available at
Human cancer is driven by the acquisition of genomic alterations. These alterations include amplifications and deletions of portions of one or both chromosomes in the cell. The localization of such copy number changes is an important pursuit in cancer genomics research because amplifications frequently harbor cancer-causing oncogenes, while deleted regions often contain tumor-suppressor genes. In this paper the authors present an expectation-maximization-based procedure that, when applied to data from single nucleotide polymorphism arrays, estimates not only total copy number at high resolution across the genome, but also the contribution of each parental chromosome to copy number. Applying this approach to data from over 100 lung cancer samples the authors find that, in essentially all cases, amplification is monoallelic. That is, only one of the two parental chromosomes contributes to the copy number elevation in each amplified region. This phenomenon makes possible the identification of haplotypes, or patterns of single nucleotide polymorphism alleles, that may serve as markers for the tumor-inducing genetic variants being targeted.
PMCID: PMC1289392  PMID: 16322765
16.  Clinical Significance of Previously Cryptic Copy Number Alterations and Loss of Heterozygosity in Pediatric Acute Myeloid Leukemia and Myelodysplastic Syndrome Determined Using Combined Array Comparative Genomic Hybridization plus Single-Nucleotide Polymorphism Microarray Analyses 
Journal of Korean Medical Science  2014;29(7):926-933.
The combined array comparative genomic hybridization plus single-nucleotide polymorphism microarray (CGH+SNP microarray) platform can simultaneously detect copy number alterations (CNA) and copy-neutral loss of heterozygosity (LOH). Eighteen children with acute myeloid leukemia (AML) (n=15) or myelodysplastic syndrome (MDS) (n=3) were studied using CGH+SNP microarray to evaluate the clinical significance of submicroscopic chromosomal aberrations. CGH+SNP microarray revealed CNAs at 14 regions in 9 patients, while metaphase cytogenetic (MC) analysis detected CNAs in 11 regions in 8 patients. Using CGH+SNP microarray, LOHs>10 Mb involving terminal regions or the whole chromosome were detected in 3 of 18 patients (17%). CGH+SNP microarray revealed cryptic LOHs with or without CNAs in 3 of 5 patients with normal karyotypes. CGH+SNP microarray detected additional cryptic CNAs (n=2) and LOHs (n=5) in 6 of 13 patients with abnormal MC. In total, 9 patients demonstrated additional aberrations, including CNAs (n=3) and/or LOHs (n=8). Three of 15 patients with AML and terminal LOH>10 Mb demonstrated a significantly inferior relapse-free survival rate (P=0.041). This study demonstrates that CGH+SNP microarray can simultaneously detect previously cryptic CNAs and LOH, which may demonstrate prognostic implications.
Graphical Abstract
PMCID: PMC4101780  PMID: 25045224
Leukemia, Myeloid, Acute; DNA Copy Number Variations; Loss of Heterozygosity; Comparative Genomic Hybridization; Single-Nucleotide Polymorphism Microarray
17.  Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data 
Annals of Oncology  2014;26(1):64-70.
We describe our algorithm and software for determining copy number profiles from tumor genome sequencing data, and find that it compares favorably to existing algorithms for the same purpose.
Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described.
Materials and methods
We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm.
Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%.
The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations.
PMCID: PMC4269342  PMID: 25319062
cancer genomics; copy number alterations; mutations; next-generation sequencing; software
18.  Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan 
BMC Bioinformatics  2006;7:25.
A variety of diseases are caused by chromosomal abnormalities such as aneuploidies (having an abnormal number of chromosomes), microdeletions, microduplications, and uniparental disomy. High density single nucleotide polymorphism (SNP) microarrays provide information on chromosomal copy number changes, as well as genotype (heterozygosity and homozygosity). SNP array studies generate multiple types of data for each SNP site, some with more than 100,000 SNPs represented on each array. The identification of different classes of anomalies within SNP data has been challenging.
We have developed SNPscan, a web-accessible tool to analyze and visualize high density SNP data. It enables researchers (1) to visually and quantitatively assess the quality of user-generated SNP data relative to a benchmark data set derived from a control population, (2) to display SNP intensity and allelic call data in order to detect chromosomal copy number anomalies (duplications and deletions), (3) to display uniparental isodisomy based on loss of heterozygosity (LOH) across genomic regions, (4) to compare paired samples (e.g. tumor and normal), and (5) to generate a file type for viewing SNP data in the University of California, Santa Cruz (UCSC) Human Genome Browser. SNPscan accepts data exported from Affymetrix Copy Number Analysis Tool as its input. We validated SNPscan using data generated from patients with known deletions, duplications, and uniparental disomy. We also inspected previously generated SNP data from 90 apparently normal individuals from the Centre d'Étude du Polymorphisme Humain (CEPH) collection, and identified three cases of uniparental isodisomy, four females having an apparently mosaic X chromosome, two mislabelled SNP data sets, and one microdeletion on chromosome 2 with mosaicism from an apparently normal female. These previously unrecognized abnormalities were all detected using SNPscan. The microdeletion was independently confirmed by fluorescence in situ hybridization, and a region of homozygosity in a UPD case was confirmed by sequencing of genomic DNA.
SNPscan is useful to identify chromosomal abnormalities based on SNP intensity (such as chromosomal copy number changes) and heterozygosity data (including regions of LOH and some cases of UPD). The program and source code are available at the SNPscan website .
PMCID: PMC1382255  PMID: 16420694
19.  Aberrant DNA Methylation of OLIG1, a Novel Prognostic Factor in Non-Small Cell Lung Cancer 
PLoS Medicine  2007;4(3):e108.
Lung cancer is the leading cause of cancer-related death worldwide. Currently, tumor, node, metastasis (TNM) staging provides the most accurate prognostic parameter for patients with non-small cell lung cancer (NSCLC). However, the overall survival of patients with resectable tumors varies significantly, indicating the need for additional prognostic factors to better predict the outcome of the disease, particularly within a given TNM subset.
Methods and Findings
In this study, we investigated whether adenocarcinomas and squamous cell carcinomas could be differentiated based on their global aberrant DNA methylation patterns. We performed restriction landmark genomic scanning on 40 patient samples and identified 47 DNA methylation targets that together could distinguish the two lung cancer subgroups. The protein expression of one of those targets, oligodendrocyte transcription factor 1 (OLIG1), significantly correlated with survival in NSCLC patients, as shown by univariate and multivariate analyses. Furthermore, the hazard ratio for patients negative for OLIG1 protein was significantly higher than the one for those patients expressing the protein, even at low levels.
Multivariate analyses of our data confirmed that OLIG1 protein expression significantly correlates with overall survival in NSCLC patients, with a relative risk of 0.84 (95% confidence interval 0.77–0.91, p < 0.001) along with T and N stages, as indicated by a Cox proportional hazard model. Taken together, our results suggests that OLIG1 protein expression could be utilized as a novel prognostic factor, which could aid in deciding which NSCLC patients might benefit from more aggressive therapy. This is potentially of great significance, as the addition of postoperative adjuvant chemotherapy in T2N0 NSCLC patients is still controversial.
Christopher Plass and colleagues find thatOLIG1 expression correlates with survival in lung cancer patients and suggest that it could be used in deciding which patients are likely to benefit from more aggressive therapy.
Editors' Summary
Lung cancer is the commonest cause of cancer-related death worldwide. Most cases are of a type called non-small cell lung cancer (NSCLC). Like other cancers, treatment of NCSLC depends on the “TNM stage” at which the cancer is detected. Staging takes into account the size and local spread of the tumor (its T classification), whether nearby lymph nodes contain tumor cells (its N classification), and whether tumor cells have spread (metastasized) throughout the body (its M classification). Stage I tumors are confined to the lung and are removed surgically. Stage II tumors have spread to nearby lymph nodes and are treated with a combination of surgery and chemotherapy. Stage III tumors have spread throughout the chest, and stage IV tumors have metastasized around the body; patients with both of these stages are treated with chemotherapy alone. About 70% of patients with stage I or II lung cancer, but only 2% of patients with stage IV lung cancer, survive for five years after diagnosis.
Why Was This Study Done?
TNM staging is the best way to predict the likely outcome (prognosis) for patients with NSCLC, but survival times for patients with stage I and II tumors vary widely. Another prognostic marker—maybe a “molecular signature”—that could distinguish patients who are likely to respond to treatment from those whose cancer will inevitably progress would be very useful. Unlike normal cells, cancer cells divide uncontrollably and can move around the body. These behavioral changes are caused by alterations in the pattern of proteins expressed by the cells. But what causes these alterations? The answer in some cases is “epigenetic changes” or chemical modifications of genes. In cancer cells, methyl groups are aberrantly added to GC-rich gene regions. These so-called “CpG islands” lie near gene promoters (sequences that control the transcription of DNA into mRNA, the template for protein production), and their methylation stops the promoters working and silences the gene. In this study, the researchers have investigated whether aberrant methylation patterns vary between NSCLC subtypes and whether specific aberrant methylations are associated with survival and can, therefore, be used prognostically.
What Did the Researchers Do and Find?
The researchers used “restriction landmark genomic scanning” (RLGS) to catalog global aberrant DNA methylation patterns in human lung tumor samples. In RLGS, DNA is cut into fragments with a restriction enzyme (a protein that cuts at specific DNA sequences), end-labeled, and separated using two-dimensional gel electrophoresis to give a pattern of spots. Because methylation stops some restriction enzymes cutting their target sequence, normal lung tissue and lung tumor samples yield different patterns of spots. The researchers used these patterns to identify 47 DNA methylation targets (many in CpG islands) that together distinguished between adenocarcinomas and squamous cell carcinomas, two major types of NSCLCs. Next, they measured mRNA production from the genes with the greatest difference in methylation between adenocarcinomas and squamous cell carcinomas. OLIG1 (the gene that encodes a protein involved in nerve cell development) had one of the highest differences in mRNA production between these tumor types. Furthermore, three-quarters of NSCLCs had reduced or no expression of OLIG1 protein and, when the researchers analyzed the association between OLIG1 protein expression and overall survival in patients with NSCLC, reduced OLIG1 protein expression was associated with reduced survival.
What Do These Findings Mean?
These findings indicate that different types of NSCLC can be distinguished by examining their aberrant methylation patterns. This suggests that the establishment of different DNA methylation patterns might be related to the cell type from which the tumors developed. Alternatively, the different aberrant methylation patterns might reflect the different routes that these cells take to becoming tumor cells. This research identifies a potential new prognostic marker for NSCLC by showing that OLIG1 protein expression correlates with overall survival in patients with NSCLC. This correlation needs to be tested in a clinical setting to see if adding OLIG1 expression to the current prognostic parameters can lead to better treatment choices for early-stage lung cancer patients and ultimately improve these patients' overall survival.
Additional Information.
Please access these Web sites via the online version of this summary at
Patient and professional information on lung cancer, including staging (in English and Spanish), is available from the US National Cancer Institute
The MedlinePlus encyclopedia has pages on non-small cell lung cancer (in English and Spanish)
Cancerbackup provides patient information on lung cancer
CancerQuest, provided by Emory University, has information about how cancer develops (in English, Spanish, Chinese and Russian)
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence gives background information and the latest news about epigenetics (in several European languages)
PMCID: PMC1831740  PMID: 17388669
20.  Major copy proportion analysis of tumor samples using SNP arrays 
BMC Bioinformatics  2008;9:204.
Single nucleotide polymorphisms (SNPs) are the most common genetic variations in the human genome and are useful as genomic markers. Oligonucleotide SNP microarrays have been developed for high-throughput genotyping of up to 900,000 human SNPs and have been used widely in linkage and cancer genomics studies. We have previously used Hidden Markov Models (HMM) to analyze SNP array data for inferring copy numbers and loss-of-heterozygosity (LOH) from paired normal and tumor samples and unpaired tumor samples.
We proposed and implemented major copy proportion (MCP) analysis of oligonucleotide SNP array data. A HMM was constructed to infer unobserved MCP states from observed allele-specific signals through emission and transition distributions. We used 10 K, 100 K and 250 K SNP array datasets to compare MCP analysis with LOH and copy number analysis, and showed that MCP performs better than LOH analysis for allelic-imbalanced chromosome regions and normal contaminated samples. The major and minor copy alleles can also be inferred from allelic-imbalanced regions by MCP analysis.
MCP extends tumor LOH analysis to allelic imbalance analysis and supplies complementary information to total copy numbers. MCP analysis of mixing normal and tumor samples suggests the utility of MCP analysis of normal-contaminated tumor samples. The described analysis and visualization methods are readily available in the user-friendly dChip software.
PMCID: PMC2375907  PMID: 18426588
21.  Detection of novel copy number variants in uterine leiomyomas using high-resolution SNP arrays 
Molecular Human Reproduction  2009;15(9):563-568.
Uterine leiomyomas (ULs) are benign monoclonal tumors originating from myometrial tissue in the uterus. Genetic pathways that lead to myometrial transformation into leiomyomas are largely unknown. Approximately 40% of ULs are karyotypically abnormal by G-banding; however, the remaining 60% of leiomyomas do not contain cytogenetically visible genomic rearrangements. Recent technological advances such as array based comparative genomic hybridization (array CGH) and dense single nucleotide polymorphism (SNP) arrays have enabled genome-wide scanning for genomic rearrangements missed by karyotype banding analysis. In the current study, we employed a high resolution SNP microarray on 16 randomly selected ULs and normal myometrium samples to detect submicroscopic (<5 Mb) chromosomal aberrations. The SNP array identified gene dosage changes in 56% of the fibroids (9/16), 25% of which (4/16) had aberrations >5 Mb, whereas 31% of which (5/16) contained only submicroscopic copy number changes (<5 Mb). We corroborated 3/5 submicroscopic changes using quantitative PCR, meaning that ultimately, 19% of our samples (3/16) were found to contain only submicroscopic changes. Novel submicroscopic aberrations on chromosomal segments 1q42.13, 11q13.1 and 13q12.13 and large, previously unreported deletions on 15q11.2–q23, 17p–q21.31 and 22q12.2-q12.3 were identified. Previously reported deletions on 1p, 3q, 7q, 13, and chromosome 14q were also noted. RHOU, MAP3K11 and WASF3 gene copy numbers were changed in the subset of leiomyomas with submicroscopic aberrations, and these genes have previously been implicated in tumorigenesis. Our findings support the hypothesis that a significant fraction of ULs without visible cytogenetic changes harbor submicroscopic genomic rearrangements which may in turn contribute to transformation of normal myometrial tissue into leiomyomas.
PMCID: PMC2725754  PMID: 19567454
copy number variation; leiomyoma and fibroid; microdeletion and microduplication; SNP microarray; uterus
22.  Allelic Imbalances and Microdeletions Affecting the PTPRD Gene in Cutaneous Squamous Cell Carcinomas Detected Using Single Nucleotide Polymorphism Microarray Analysis 
Genes, chromosomes & cancer  2007;46(7):661-669.
Cutaneous squamous cell carcinomas (SCC) are the second most commonly diagnosed cancers in fair-skinned people; yet the genetic mechanisms involved in SCC tumorigenesis remain poorly understood. We have used single nucleotide polymorphism (SNP) microarray analysis to examine genome-wide allelic imbalance in 16 primary and 2 lymph node metastatic SCC using paired non-tumour samples to counteract normal copy number variation. The most common genetic change was loss of heterozygosity (LOH) on 9p, observed in 13 of 16 primary SCC. Other recurrent events included LOH on 3p (9 tumors), 2q, 8p, and 13 (each in 8 SCC) and allelic gain on 3q and 8q (each in 6 tumors). Copy number-neutral LOH was observed in a proportion of samples, implying that somatic recombination had led to acquired uniparental disomy, an event not previously demonstrated in SCC. As well as recurrent patterns of gross chromosomal changes, SNP microarray analysis revealed, in 2 primary SCC, a homozygous microdeletion on 9p23 within the protein tyrosine phosphatase receptor type D (PTPRD) locus, an emerging frequent target of homozygous deletion in lung cancer and neuroblastoma. A third sample was heterozygously deleted within this locus and PTPRD expression was aberrant. Two of the 3 primary SCC with PTPRD deletion had demonstrated metastatic potential. Our data identify PTPRD as a candidate tumor suppressor gene in cutaneous SCC with a possible association with metastasis.
PMCID: PMC2426828  PMID: 17420988
23.  A Genome-Wide Screen for Promoter Methylation in Lung Cancer Identifies Novel Methylation Markers for Multiple Malignancies  
PLoS Medicine  2006;3(12):e486.
Promoter hypermethylation coupled with loss of heterozygosity at the same locus results in loss of gene function in many tumor cells. The “rules” governing which genes are methylated during the pathogenesis of individual cancers, how specific methylation profiles are initially established, or what determines tumor type-specific methylation are unknown. However, DNA methylation markers that are highly specific and sensitive for common tumors would be useful for the early detection of cancer, and those required for the malignant phenotype would identify pathways important as therapeutic targets.
Methods and Findings
In an effort to identify new cancer-specific methylation markers, we employed a high-throughput global expression profiling approach in lung cancer cells. We identified 132 genes that have 5′ CpG islands, are induced from undetectable levels by 5-aza-2′-deoxycytidine in multiple non-small cell lung cancer cell lines, and are expressed in immortalized human bronchial epithelial cells. As expected, these genes were also expressed in normal lung, but often not in companion primary lung cancers. Methylation analysis of a subset (45/132) of these promoter regions in primary lung cancer (n = 20) and adjacent nonmalignant tissue (n = 20) showed that 31 genes had acquired methylation in the tumors, but did not show methylation in normal lung or peripheral blood cells. We studied the eight most frequently and specifically methylated genes from our lung cancer dataset in breast cancer (n = 37), colon cancer (n = 24), and prostate cancer (n = 24) along with counterpart nonmalignant tissues. We found that seven loci were frequently methylated in both breast and lung cancers, with four showing extensive methylation in all four epithelial tumors.
By using a systematic biological screen we identified multiple genes that are methylated with high penetrance in primary lung, breast, colon, and prostate cancers. The cross-tumor methylation pattern we observed for these novel markers suggests that we have identified a partial promoter hypermethylation signature for these common malignancies. These data suggest that while tumors in different tissues vary substantially with respect to gene expression, there may be commonalities in their promoter methylation profiles that represent targets for early detection screening or therapeutic intervention.
John Minna and colleagues report that a group of genes are commonly methylated in primary lung, breast, colon, and prostate cancer.
Editors' Summary
Tumors or cancers contain cells that have lost many of the control mechanisms that normally regulate their behavior. Unlike normal cells, which only divide to repair damaged tissues, cancer cells divide uncontrollably. They also gain the ability to move round the body and start metastases in secondary locations. These changes in behavior result from alterations in their genetic material. For example, mutations (permanent changes in the sequence of nucleotides in the cell's DNA) in genes known as oncogenes stimulate cells to divide constantly. Mutations in another group of genes—tumor suppressor genes—disable their ability to restrain cell growth. Key tumor suppressor genes are often completely lost in cancer cells. But not all the genetic changes in cancer cells are mutations. Some are “epigenetic” changes—chemical modifications of genes that affect the amount of protein made from them. In cancer cells, methyl groups are often added to CG-rich regions—this is called hypermethylation. These “CpG islands” lie near gene promoters—sequences that control the transcription of DNA into RNA, the template for protein production—and their methylation switches off the promoter. Methylation of the promoter of one copy of a tumor suppressor gene, which often coincides with the loss of the other copy of the gene, is thought to be involved in cancer development.
Why Was This Study Done?
The rules that govern which genes are hypermethylated during the development of different cancer types are not known, but it would be useful to identify any DNA methylation events that occur regularly in common cancers for two reasons. First, specific DNA methylation markers might be useful for the early detection of cancer. Second, identifying these epigenetic changes might reveal cellular pathways that are changed during cancer development and so identify new therapeutic targets. In this study, the researchers have used a systematic biological screen to identify genes that are methylated in many lung, breast, colon, and prostate cancers—all cancers that form in “epithelial” tissues.
What Did the Researchers Do and Find?
The researchers used microarray expression profiling to examine gene expression patterns in several lung cancer and normal lung cell lines. In this technique, labeled RNA molecules isolated from cells are applied to a “chip” carrying an array of gene fragments. Here, they stick to the fragment that represents the gene from which they were made, which allows the genes that the cells express to be catalogued. By comparing the expression profiles of lung cancer cells and normal lung cells before and after treatment with a chemical that inhibits DNA methylation, the researchers identified genes that were methylated in the cancer cells—that is, genes that were expressed in normal cells but not in cancer cells unless methylation was inhibited. 132 of these genes contained CpG islands. The researchers examined the promoters of 45 of these genes in lung cancer cells taken straight from patients and found that 31 of the promoters were methylated in tumor tissues but not in adjacent normal tissues. Finally, the researchers looked at promoter methylation of the eight genes most frequently and specifically methylated in the lung cancer samples in breast, colon, and prostate cancers. Seven of the genes were frequently methylated in both lung and breast cancers; four were extensively methylated in all the tumor types.
What Do These Findings Mean?
These results identify several new genes that are often methylated in four types of epithelial tumor. The observation that these genes are methylated in multiple independent tumors strongly suggests, but does not prove, that loss of expression of the proteins that they encode helps to convert normal cells into cancer cells. The frequency and diverse patterning of promoter methylation in different tumor types also indicates that methylation is not a random event, although what controls the patterns of methylation is not yet known. The identification of these genes is a step toward building a promoter hypermethylation profile for the early detection of human cancer. Furthermore, although tumors in different tissues vary greatly with respect to gene expression patterns, the similarities seen in this study in promoter methylation profiles might help to identify new therapeutic targets common to several cancer types.
Additional Information.
Please access these Web sites via the online version of this summary at
US National Cancer Institute, information for patients on understanding cancer
CancerQuest, information provided by Emory University about how cancer develops
Cancer Research UK, information for patients on cancer biology
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence, background information and latest news about epigenetics
PMCID: PMC1716188  PMID: 17194187
24.  Polymorphisms, Mutations, and Amplification of the EGFR Gene in Non-Small Cell Lung Cancers 
PLoS Medicine  2007;4(4):e125.
The epidermal growth factor receptor (EGFR) gene is the prototype member of the type I receptor tyrosine kinase (TK) family and plays a pivotal role in cell proliferation and differentiation. There are three well described polymorphisms that are associated with increased protein production in experimental systems: a polymorphic dinucleotide repeat (CA simple sequence repeat 1 [CA-SSR1]) in intron one (lower number of repeats) and two single nucleotide polymorphisms (SNPs) in the promoter region, −216 (G/T or T/T) and −191 (C/A or A/A). The objective of this study was to examine distributions of these three polymorphisms and their relationships to each other and to EGFR gene mutations and allelic imbalance (AI) in non-small cell lung cancers.
Methods and Findings
We examined the frequencies of the three polymorphisms of EGFR in 556 resected lung cancers and corresponding non-malignant lung tissues from 336 East Asians, 213 individuals of Northern European descent, and seven of other ethnicities. We also studied the EGFR gene in 93 corresponding non-malignant lung tissue samples from European-descent patients from Italy and in peripheral blood mononuclear cells from 250 normal healthy US individuals enrolled in epidemiological studies including individuals of European descent, African–Americans, and Mexican–Americans. We sequenced the four exons (18–21) of the TK domain known to harbor activating mutations in tumors and examined the status of the CA-SSR1 alleles (presence of heterozygosity, repeat number of the alleles, and relative amplification of one allele) and allele-specific amplification of mutant tumors as determined by a standardized semiautomated method of microsatellite analysis. Variant forms of SNP −216 (G/T or T/T) and SNP −191 (C/A or A/A) (associated with higher protein production in experimental systems) were less frequent in East Asians than in individuals of other ethnicities (p < 0.001). Both alleles of CA-SSR1 were significantly longer in East Asians than in individuals of other ethnicities (p < 0.001). Expression studies using bronchial epithelial cultures demonstrated a trend towards increased mRNA expression in cultures having the variant SNP −216 G/T or T/T genotypes. Monoallelic amplification of the CA-SSR1 locus was present in 30.6% of the informative cases and occurred more often in individuals of East Asian ethnicity. AI was present in 44.4% (95% confidence interval: 34.1%–54.7%) of mutant tumors compared with 25.9% (20.6%–31.2%) of wild-type tumors (p = 0.002). The shorter allele in tumors with AI in East Asian individuals was selectively amplified (shorter allele dominant) more often in mutant tumors (75.0%, 61.6%–88.4%) than in wild-type tumors (43.5%, 31.8%–55.2%, p = 0.003). In addition, there was a strong positive association between AI ratios of CA-SSR1 alleles and AI of mutant alleles.
The three polymorphisms associated with increased EGFR protein production (shorter CA-SSR1 length and variant forms of SNPs −216 and −191) were found to be rare in East Asians as compared to other ethnicities, suggesting that the cells of East Asians may make relatively less intrinsic EGFR protein. Interestingly, especially in tumors from patients of East Asian ethnicity, EGFR mutations were found to favor the shorter allele of CA-SSR1, and selective amplification of the shorter allele of CA-SSR1 occurred frequently in tumors harboring a mutation. These distinct molecular events targeting the same allele would both be predicted to result in greater EGFR protein production and/or activity. Our findings may help explain to some of the ethnic differences observed in mutational frequencies and responses to TK inhibitors.
Masaharu Nomura and colleagues examine the distribution ofEGFR polymorphisms in different populations and find differences that might explain different responses to tyrosine kinase inhibitors in lung cancer patients.
Editors' Summary
Most cases of lung cancer—the leading cause of cancer deaths worldwide—are “non-small cell lung cancer” (NSCLC), which has a very low cure rate. Recently, however, “targeted” therapies have brought new hope to patients with NSCLC. Like all cancers, NSCLC occurs when cells begin to divide uncontrollably because of changes (mutations) in their genetic material. Chemotherapy drugs treat cancer by killing these rapidly dividing cells, but, because some normal tissues are sensitive to these agents, it is hard to kill the cancer completely without causing serious side effects. Targeted therapies specifically attack the changes in cancer cells that allow them to divide uncontrollably, so it might be possible to kill the cancer cells selectively without damaging normal tissues. Epidermal growth factor receptor (EGRF) was one of the first molecules for which a targeted therapy was developed. In normal cells, messenger proteins bind to EGFR and activate its “tyrosine kinase,” an enzyme that sticks phosphate groups on tyrosine (an amino acid) in other proteins. These proteins then tell the cell to divide. Alterations to this signaling system drive the uncontrolled growth of some cancers, including NSCLC.
Why Was This Study Done?
Molecules that inhibit the tyrosine kinase activity of EGFR (for example, gefitinib) dramatically shrink some NSCLCs, particularly those in East Asian patients. Tumors shrunk by tyrosine kinase inhibitors (TKIs) often (but not always) have mutations in EGFR's tyrosine kinase. However, not all tumors with these mutations respond to TKIs, and other genetic changes—for example, amplification (multiple copies) of the EGFR gene—also affect tumor responses to TKIs. It would be useful to know which genetic changes predict these responses when planning treatments for NSCLC and to understand why the frequency of these changes varies between ethnic groups. In this study, the researchers have examined three polymorphisms—differences in DNA sequences that occur between individuals—in the EGFR gene in people with and without NSCLC. In addition, they have looked for associations between these polymorphisms, which are present in every cell of the body, and the EGFR gene mutations and allelic imbalances (genes occur in pairs but amplification or loss of one copy, or allele, often causes allelic imbalance in tumors) that occur in NSCLCs.
What Did the Researchers Do and Find?
The researchers measured how often three EGFR polymorphisms (the length of a repeat sequence called CA-SSR1, and two single nucleotide variations [SNPs])—all of which probably affect how much protein is made from the EGFR gene—occurred in normal tissue and NSCLC tissue from East Asians and individuals of European descent. They also looked for mutations in the EGFR tyrosine kinase and allelic imbalance in the tumors, and then determined which genetic variations and alterations tended to occur together in people with the same ethnicity. Among many associations, the researchers found that shorter alleles of CA-SSR1 and the minor forms of the two SNPs occurred less often in East Asians than in individuals of European descent. They also confirmed that EGFR kinase mutations were more common in NSCLCs in East Asians than in European-descent individuals. Furthermore, mutations occurred more often in tumors with allelic imbalance, and in tumors where there was allelic imbalance and an EGFR mutation, the mutant allele was amplified more often than the wild-type allele.
What Do These Findings Mean?
The researchers use these associations between gene variants and tumor-associated alterations to propose a model to explain the ethnic differences in mutational frequencies and responses to TKIs seen in NSCLC. They suggest that because of the polymorphisms in the EGFR gene commonly seen in East Asians, people from this ethnic group make less EGFR protein than people from other ethnic groups. This would explain why, if a threshold level of EGFR is needed to drive cells towards malignancy, East Asians have a high frequency of amplified EGFR tyrosine kinase mutations in their tumors—mutation followed by amplification would be needed to activate EGFR signaling. This model, though speculative, helps to explain some clinical findings, such as the frequency of EGFR mutations and of TKI sensitivity in NSCLCs in East Asians. Further studies of this type in different ethnic groups and in different tumors, as well as with other genes for which targeted therapies are available, should help oncologists provide personalized cancer therapies for their patients.
Additional Information.
Please access these Web sites via the online version of this summary at
US National Cancer Institute information on lung cancer and on cancer treatment for patients and professionals
MedlinePlus encyclopedia entries on NSCLC
Cancer Research UK information for patients about all aspects of lung cancer, including treatment with TKIs
Wikipedia pages on lung cancer, EGFR, and gefitinib (note that Wikipedia is a free online encyclopedia that anyone can edit)
PMCID: PMC1876407  PMID: 17455987
25.  Estimation of Parent Specific DNA Copy Number in Tumors using High-Density Genotyping Arrays 
PLoS Computational Biology  2011;7(1):e1001060.
Chromosomal gains and losses comprise an important type of genetic change in tumors, and can now be assayed using microarray hybridization-based experiments. Most current statistical models for DNA copy number estimate total copy number, which do not distinguish between the underlying quantities of the two inherited chromosomes. This latter information, sometimes called parent specific copy number, is important for identifying allele-specific amplifications and deletions, for quantifying normal cell contamination, and for giving a more complete molecular portrait of the tumor. We propose a stochastic segmentation model for parent-specific DNA copy number in tumor samples, and give an estimation procedure that is computationally efficient and can be applied to data from the current high density genotyping platforms. The proposed method does not require matched normal samples, and can estimate the unknown genotypes simultaneously with the parent specific copy number. The new method is used to analyze 223 glioblastoma samples from the Cancer Genome Atlas (TCGA) project, giving a more comprehensive summary of the copy number events in these samples. Detailed case studies on these samples reveal the additional insights that can be gained from an allele-specific copy number analysis, such as the quantification of fractional gains and losses, the identification of copy neutral loss of heterozygosity, and the characterization of regions of simultaneous changes of both inherited chromosomes.
Author Summary
Many genetic diseases are related to copy number aberrations of some regions of the genome. As we know, each chromosome normally has two copies. However, under some circumstances, for some regions, either one or both of the chromosomes change. Genotyping microarray data provides the copy number of the two alleles of polymorphic sites along the chromosomes, which make the inference of the copy number aberrations of the chromosome feasible. One difficulty is that genotyping microarray data cannot provide the haplotype of the two copies of a chromosome. In this paper, we model the copy number along the chromosome as a two-dimensional Markov Chain. Using the observed copy number of both alleles of all the sites, we can determine the parent specific copy number along the chromosome as well as infer the haplotypes of the two copies of the inherited chromosomes in regions where there is allelic imbalance. Simulation results show high sensitivity and specificity of the method. Applying this method to glioblastoma samples from the Cancer Genome Atlas data illustrate the insights gained from allele-specific copy number analysis.
PMCID: PMC3029233  PMID: 21298078

Results 1-25 (1964028)