Genomic instability plays an important role in human cancers. We previously characterized genomic instability in esophageal squamous cell carcinomas (ESCC) in terms of loss of heterozygosity (LOH) and copy number (CN) changes in tumors using the Affymetrix GeneChip Human Mapping 500K array in 30 cases from a high-risk region of China. In the current study we focused on copy number neutral (CN = 2) LOH (CNNLOH) and its relation to gene expression in ESCC.
Overall we found that 70% of all LOH observed was CNNLOH. Ninety percent of ESCCs showed CNNLOH (median frequency in cases = 60%) and this was the most common type of LOH in two-thirds of cases. CNNLOH occurred on all 39 autosomal chromosome arms, with highest frequencies on 19p (100%), 5p (96%), 2p (95%), and 20q (95%). In contrast, LOH with CN loss represented 19% of all LOH, occurred in just half of ESCCs (median frequency in cases = 0%), and was most frequent on 3p (56%), 5q (47%), and 21q (41%). LOH with CN gain was 11% of all LOH, occurred in 93% of ESCCs (median frequency in cases = 13%), and was most common on 20p (82%), 8q (74%), and 3q (42%). To examine the effect of genomic instability on gene expression, we evaluated RNA profiles from 17 pairs of matched normal and tumor samples (a subset of the 30 ESCCs) using Affymetrix U133A 2.0 arrays. In CN neutral regions, expression of 168 genes (containing 1976 SNPs) differed significantly in tumors with LOH versus tumors without LOH, including 101 genes that were up-regulated and 67 that were down-regulated.
Our results indicate that CNNLOH has a profound impact on gene expression in ESCC, which in turn may affect tumor development.
Genetic aberrations are crucial in renal tumor progression. In this study, we describe loss of heterozygosity (LOH) and DNA-copy number abnormalities in clear cell renal cell carcinoma (cc-RCC) discovered by genome-wide single nucleotide polymorphism (SNP) arrays. Genomic DNA from tumor and normal tissue of 22 human cc-RCCs was analyzed on the Affymetrix GeneChip Human Mapping 10K Array. The array data were validated by quantitative polymerase chain reaction and immunohistochemistry. Reduced DNA copy numbers were detected on chromosomal arm 3p in 91%, on chromosome 9 in 32%, and on chromosomal arm 14q in 36% of the tumors. Gains were detected on chromosomal arm 5q in 45% and on chromosome 7 in 32% of the tumors. Copy number abnormalities were found not only in FHIT and VHL loci, known to be involved in renal carcinogenesis, but also in regions containing putative new tumor suppressor genes or oncogenes. In addition, microdeletions were detected on chromosomes 1 and 6 in genes with unknown impact on renal carcinogenesis. In validation experiments, abnormal protein expression of FOXP1 (on 3p) was found in 90% of tumors (concordance with SNP array data in 85%). As assessed by quantitative polymerase chain reaction, PARK2 and PACRG were down-regulated in 57% and 100%, respectively, and CSF1R was up-regulated in 69% of the cc-RCC cases (concordance with SNP array data in 57%, 33%, and 38%). Genome-wide SNP array analysis not only confirmed previously described large chromosomal aberrations but also detected novel microdeletions in genes potentially involved in tumor genesis of cc-RCC.
Genotyping platforms such as single nucleotide polymorphism (SNP) arrays are powerful tools to study genomic aberrations in cancer samples. Allele specific information from SNP arrays provides valuable information for interpreting copy number variation (CNV) and allelic imbalance including loss-of-heterozygosity (LOH) beyond that obtained from the total DNA signal available from array comparative genomic hybridization (aCGH) platforms. Several algorithms based on hidden Markov models (HMMs) have been designed to detect copy number changes and copy-neutral LOH making use of the allele information on SNP arrays. However heterogeneity in clinical samples, due to stromal contamination and somatic alterations, complicates analysis and interpretation of these data.
We have developed MixHMM, a novel hidden Markov model using hidden states based on chromosomal structural aberrations. MixHMM allows CNV detection for copy numbers up to 7 and allows more complete and accurate description of other forms of allelic imbalance, such as increased copy number LOH or imbalanced amplifications. MixHMM also incorporates a novel sample mixing model that allows detection of tumor CNV events in heterogeneous tumor samples, where cancer cells are mixed with a proportion of stromal cells.
We validate MixHMM and demonstrate its advantages with simulated samples, clinical tumor samples and a dilution series of mixed samples. We have shown that the CNVs of cancer cells in a tumor sample contaminated with up to 80% of stromal cells can be detected accurately using Illumina BeadChip and MixHMM.
The MixHMM is available as a Python package provided with some other useful tools at http://genecube.med.yale.edu:8080/MixHMM.
In an effort to identify novel genes implicated in breast carcinogenesis, a genomewide scan for loss of heterozygosity (LOH) and copy number changes in paired-DNA samples extracted from normal and tumor tissue of frozen sections from women undergoing surgery for invasive breast cancer was conducted. The Affymetrix 10K SNP array was used to examine genomewide LOH of chromosomal regions. The number of LOH events, number of informative loci, percent heterozygosity, and percent fractional allelic loss (%FAL) were calculated. LOH events were detected in all samples, however, the proportion of LOH ranged from 0.1-57.2%. Elevated LOH events were detected in two samples, with a %FAL of 57.2 and 56.2. Chromosomal regions exceeding a threshold value for a p-value curve based on multiple-testing adjusted permutation methods were identified as significant regions of shared LOH across samples. Regions with significant LOH included: 2p25.3; 2p21; 2p16.1 – 2p15; 2q23.3; and, 16q12.1. Chromosomal region 1q32.1 was identified as a region with significant copy number amplification. Regions of LOH and copy number changes identified from this analysis may provide insights into the underlying processes of and genes involved in breast carcinogenesis. This study demonstrates a feasible methodological approach for the assessment of LOH and copy number changes.
Genomic copy number alteration and allelic imbalance are distinct features of cancer cells, and recent advances in the genotyping technology have greatly boosted the research in the cancer genome. However, the complicated nature of tumor usually hampers the dissection of the SNP arrays. In this study, we describe a bioinformatic tool, named GIANT, for genome-wide identification of somatic aberrations from paired normal-tumor samples measured with SNP arrays. By efficiently incorporating genotype information of matched normal sample, it accurately detects different types of aberrations in cancer genome, even for aneuploid tumor samples with severe normal cell contamination. Furthermore, it allows for discovery of recurrent aberrations with critical biological properties in tumorigenesis by using statistical significance test. We demonstrate the superior performance of the proposed method on various datasets including tumor replicate pairs, simulated SNP arrays and dilution series of normal-cancer cell lines. Results show that GIANT has the potential to detect the genomic aberration even when the cancer cell proportion is as low as 5∼10%. Application on a large number of paired tumor samples delivers a genome-wide profile of the statistical significance of the various aberrations, including amplification, deletion and LOH. We believe that GIANT represents a powerful bioinformatic tool for interpreting the complex genomic aberration, and thus assisting both academic study and the clinical treatment of cancer.
Illumina Infinium whole genome genotyping (WGG) arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH). Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples.
We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio). We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations.
The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.
There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies.
Genome-wide array approaches and sequencing analyses are powerful tools for identifying genetic aberrations in cancers, including leukemias and lymphomas. However, the clinical and biological significance of such aberrations and their subclonal distribution are poorly understood. Here, we present the first genome-wide array based study of pre-treatment and relapse samples from patients with B-cell chronic lymphocytic leukemia (B-CLL) that uses the computational statistical tool OncoSNP. We show that quantification of the proportion of copy number alterations (CNAs) and copy neutral loss of heterozygosity regions (cnLOHs) in each sample is feasible. Furthermore, we (i) reveal complex changes in the subclonal architecture of paired samples at relapse compared with pre-treatment, (ii) provide evidence supporting an association between increased genomic complexity and poor clinical outcome (iii) report previously undefined, recurrent CNA/cnLOH regions that expand or newly occur at relapse and therefore might harbor candidate driver genes of relapse and/or chemotherapy resistance. Our findings are likely to impact on future therapeutic strategies aimed towards selecting effective and individually tailored targeted therapies.
B-CLL; clonal architecture; genome-wide arrays; OncoSNP; genome imbalance; copy neutral loss of heterozygosity
Single nucleotide polymorphisms (SNPs) are the most common genetic variations in the human genome and are useful as genomic markers. Oligonucleotide SNP microarrays have been developed for high-throughput genotyping of up to 900,000 human SNPs and have been used widely in linkage and cancer genomics studies. We have previously used Hidden Markov Models (HMM) to analyze SNP array data for inferring copy numbers and loss-of-heterozygosity (LOH) from paired normal and tumor samples and unpaired tumor samples.
We proposed and implemented major copy proportion (MCP) analysis of oligonucleotide SNP array data. A HMM was constructed to infer unobserved MCP states from observed allele-specific signals through emission and transition distributions. We used 10 K, 100 K and 250 K SNP array datasets to compare MCP analysis with LOH and copy number analysis, and showed that MCP performs better than LOH analysis for allelic-imbalanced chromosome regions and normal contaminated samples. The major and minor copy alleles can also be inferred from allelic-imbalanced regions by MCP analysis.
MCP extends tumor LOH analysis to allelic imbalance analysis and supplies complementary information to total copy numbers. MCP analysis of mixing normal and tumor samples suggests the utility of MCP analysis of normal-contaminated tumor samples. The described analysis and visualization methods are readily available in the user-friendly dChip software.
There is increasing evidence showing that the stromal cells surrounding cancer epithelial cells, rather than being passive bystanders, might have a role in modifying tumor outgrowth. The molecular basis of this aspect of carcinoma etiology is controversial. Some studies have reported a high frequency of genetic aberrations in carcinoma-associated fibroblasts (CAFs), whereas other studies have reported very low or zero mutation rates. Resolution of this contentious area is of critical importance in terms of understanding both the basic biology of cancer as well as the potential clinical implications of CAF somatic alterations. We undertook genome-wide copy number and loss of heterozygosity (LOH) analysis of CAFs derived from breast and ovarian carcinomas using a 500K SNP array platform. Our data show conclusively that LOH and copy number alterations are extremely rare in CAFs and cannot be the basis of the carcinoma-promoting phenotypes of breast and ovarian CAFs.
Genetic aberrations, such as DNA copy number variation (CNV) and loss of heterozygosity (LOH), have been implicated in head and neck squamous cell carcinoma (HNSCC) initiation and progression. This review examines CNV and LOH as predictors of HNSCC recurrence and mortality.
We searched PubMed for relevant publications and compared and discussed results from the articles.
Certain CNV and LOH events have consistently been associated with HNSCC recurrence and survival. The recent high-resolution SNP arrays have the potential to identify many more genetic changes and concurrent genome-wide CNV, copy-neutral and/or allelic imbalance LOH in HNSCC that may bear on prognosis.
Our review confrms that outcome in HNSCC can be predicted to a considerable extent by the presence of tumor cell genetic aberrations. It points out the limitations of some methodologies that were used in the past and discusses the advantages and challenges of using genome-wide SNP arrays.
Head and Neck Neoplasm; Gene Dosage; Loss of Heterozygosity (LOH); Prognosis; Neoplasm Recurrence; Local
Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 (-value = 0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 (-value 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 (-value = 0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/.
Genomic aberrations can be used to determine cancer diagnosis and prognosis. Clinically relevant novel aberrations can be discovered using high-throughput assays such as Single Nucleotide Polymorphism (SNP) arrays and next-generation sequencing, which typically provide aggregate signals of many cells at once. However, heterogeneity of tumor subclones dramatically complicates the task of detecting aberrations.
The aggregate signal of a population of subclones can be described as a linear system of equations. We employed a measure of allelic imbalance and total amount of DNA to characterize each locus by the copy number status (gain, loss or neither) of the strongest subclonal component. We designed simulated data to compare our measure to existing approaches and we analyzed SNP-arrays from 30 melanoma samples and transcriptome sequencing (RNA-Seq) from one melanoma sample.
We showed that any system describing aggregate subclonal signals is underdetermined, leading to non-unique solutions for the exact copy number profile of subclones. For this reason, our illustrative measure was more robust than existing Hidden Markov Model (HMM) based tools in inferring the aberration status, as indicated by tests on simulated data. This higher robustness contributed in identifying numerous aberrations in several loci of melanoma samples. We validated the heterogeneity and aberration status within single biopsies by fluorescent in situ hybridization of four affected and transcriptionally up-regulated genes E2F8, ETV4, EZH2 and FAM84B in 11 melanoma cell lines. Heterogeneity was further demonstrated in the analysis of allelic imbalance changes along single exons from melanoma RNA-Seq.
These studies demonstrate how subclonal heterogeneity, prevalent in tumor samples, is reflected in aggregate signals measured by high-throughput techniques. Our proposed approach yields high robustness in detecting copy number alterations using high-throughput technologies and has the potential to identify specific subclonal markers from next-generation sequencing data.
copy number; SNP arrays; Next generation sequencing; melanoma
We describe a bioinformatic tool, Tumor Aberration Prediction Suite (TAPS), for the identification of allele-specific copy numbers in tumor samples using data from Affymetrix SNP arrays. It includes detailed visualization of genomic segment characteristics and iterative pattern recognition for copy number identification, and does not require patient-matched normal samples. TAPS can be used to identify chromosomal aberrations with high sensitivity even when the proportion of tumor cells is as low as 30%. Analysis of cancer samples indicates that TAPS is well suited to investigate samples with aneuploidy and tumor heterogeneity, which is commonly found in many types of solid tumors.
Cutaneous squamous cell carcinomas (SCC) are the second most commonly diagnosed cancers in fair-skinned people; yet the genetic mechanisms involved in SCC tumorigenesis remain poorly understood. We have used single nucleotide polymorphism (SNP) microarray analysis to examine genome-wide allelic imbalance in 16 primary and 2 lymph node metastatic SCC using paired non-tumour samples to counteract normal copy number variation. The most common genetic change was loss of heterozygosity (LOH) on 9p, observed in 13 of 16 primary SCC. Other recurrent events included LOH on 3p (9 tumors), 2q, 8p, and 13 (each in 8 SCC) and allelic gain on 3q and 8q (each in 6 tumors). Copy number-neutral LOH was observed in a proportion of samples, implying that somatic recombination had led to acquired uniparental disomy, an event not previously demonstrated in SCC. As well as recurrent patterns of gross chromosomal changes, SNP microarray analysis revealed, in 2 primary SCC, a homozygous microdeletion on 9p23 within the protein tyrosine phosphatase receptor type D (PTPRD) locus, an emerging frequent target of homozygous deletion in lung cancer and neuroblastoma. A third sample was heterozygously deleted within this locus and PTPRD expression was aberrant. Two of the 3 primary SCC with PTPRD deletion had demonstrated metastatic potential. Our data identify PTPRD as a candidate tumor suppressor gene in cutaneous SCC with a possible association with metastasis.
Prostate cancer cell lines provide ideal in vitro systems for the identification and analysis of prostate tumor suppressors and oncogenes. A detailed characterization of the architecture of prostate cancer cell line genomes would facilitate the study of precise roles of various genes in prostate tumorigenesis in general. To contribute to such a characterization, we used the GeneChip 500K single nucleotide polymorphic (SNP) array for analysis of genotypes and relative DNA copy number changes across the genome of 11 cell lines derived from both normal and cancerous prostate tissues. For comparison purposes, we also examined the alterations observed in the cell lines in tumor/normal pairs of clinical samples from 72 patients. Along with genome-wide maps of DNA copy number changes and loss of heterozygosity for these cell lines, we report previously unreported homozygous deletions and recurrent amplifications in prostate cancers in this study. The homozygous deletions affected a number of biologically important genes, including PPP2R2A and BNIP3L identified in this study and CDKN2A/CDKN2B reported previously. Although most amplified genomic regions tended to be large, amplifications at 8q24.21 were of particular interest because the affected regions are relatively small, are found in multiple cell lines, are located near MYC, an oncogene strongly implicated in prostate tumorigenesis, and are known to harbor SNPs that are associated with inherited susceptibility for prostate cancer. The genomic alterations revealed in this study provide an important catalog of positional information relevant to efforts aimed at deciphering the molecular genetic basis of prostate cancer.
Motivation: Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
Results: We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
Availability: The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Human epidermal growth factor receptor 2 (HER2)-amplified breast cancer represents a clinically well-defined subgroup due to availability of targeted treatment. However, HER2-amplified tumors have been shown to be heterogeneous at the genomic level by genome-wide microarray analyses, pointing towards a need of further investigations for identification of recurrent copy number alterations and delineation of patterns of allelic imbalance.
High-density whole genome array-based comparative genomic hybridization (aCGH) and single nucleotide polymorphism (SNP) array data from 260 HER2-amplified breast tumors or cell lines, and 346 HER2-negative breast cancers with molecular subtype information were assembled from different repositories. Copy number alteration (CNA), loss-of-heterozygosity (LOH), copy number neutral allelic imbalance (CNN-AI), subclonal CNA and patterns of tumor DNA ploidy were analyzed using bioinformatical methods such as genomic identification of significant targets in cancer (GISTIC) and genome alteration print (GAP). The patterns of tumor ploidy were confirmed in 338 unrelated breast cancers analyzed by DNA flow cytometry with concurrent BAC aCGH and gene expression data.
A core set of 36 genomic regions commonly affected by copy number gain or loss was identified by integrating results with a previous study, together comprising > 400 HER2-amplified tumors. While CNN-AI frequency appeared evenly distributed over chromosomes in HER2-amplified tumors, not targeting specific regions and often < 20% in frequency, the occurrence of LOH was strongly associated with regions of copy number loss. HER2-amplified and HER2-negative tumors stratified by molecular subtypes displayed different patterns of LOH and CNN-AI, with basal-like tumors showing highest frequencies followed by HER2-amplified and luminal B cases. Tumor aneuploidy was strongly associated with increasing levels of LOH, CNN-AI, CNAs and occurrence of subclonal copy number events, irrespective of subtype. Finally, SNP data from individual tumors indicated that genomic amplification in general appears as monoallelic, that is, it preferentially targets one parental chromosome in HER2-amplified tumors.
We have delineated the genomic landscape of CNAs, amplifications, LOH, and CNN-AI in HER2-amplified breast cancer, but also demonstrated a strong association between different types of genomic aberrations and tumor aneuploidy irrespective of molecular subtype.
Stromal contamination is one of the major confounding factors in the analysis of solid tumor samples by single nucleotide polymorphism (SNP) arrays. As we propose to use genome-wide SNP microarray analysis as a diagnostic platform for neuroblastoma, the sensitivity, specificity, and accuracy of these studies must be optimized. To investigate the effects of stromal contamination, we derived early-passage cell lines from nine primary tumors and compared their genomic signature with that of the primary tumors using 100K SNP arrays. The average concordance between tumor and cell line for raw loss of heterozygosity (LOH) calls was 96% (range, 91–99%) and for raw copy number alterations, 71% (range, 43–87%). In general, there were a larger number of LOH events identified in the cell lines compared with the matched tumor samples (mean increase, 3.2% ± 1.9%). We have developed an algorithm that shows that the presence of stroma contributes to under-reporting of LOH and copy number loss. Notable findings in this sample set were uniparental disomy of chromosome arms 11p, 1q, 14q, and 15q and a novel area of amplification on chromosome band 11p15. Our analysis shows that LOH was identified significantly more often in derived cell lines compared with the original tumor samples. Although these may in part be due to clonal selection during adaptation to tissue culture, our study indicates that stromal contamination may be a major contributing factor in underestimation of LOH and copy number loss events.
In colorectal cancer (CRC), chromosomal instability (CIN) is typically studied using comparative-genomic hybridization (CGH) arrays. We studied paired (tumor and surrounding healthy) fresh frozen tissue from 86 CRC patients using Illumina's Infinium-based SNP array. This method allowed us to study CIN in CRC, with simultaneous analysis of copy number (CN) and B-allele frequency (BAF) - a representation of allelic composition. These data helped us to detect mono-allelic and bi-allelic amplifications/deletion, copy neutral loss of heterozygosity, and levels of mosaicism for mixed cell populations, some of which can not be assessed with other methods that do not measure BAF. We identified associations between CN abnormalities and different CRC phenotypes (histological diagnosis, location, tumor grade, stage, MSI and presence of lymph node metastasis). We showed commonalities between regions of CN change observed in CRC and the regions reported in previous studies of other solid cancers (e.g. amplifications of 20q, 13q, 8q, 5p and deletions of 18q, 17p and 8p). From Therapeutic Target Database, we identified relevant drugs, targeted to the genes located in these regions with CN changes, approved or in trials for other cancers and common diseases. These drugs may be considered for future therapeutic trials in CRC, based on personalized cytogenetic diagnosis. We also found many regions, harboring genes, which are not currently targeted by any relevant drugs that may be considered for future drug discovery studies. Our study shows the application of high density SNP arrays for cytogenetic study in CRC and its potential utility for personalized treatment.
Cell lines are commonly used in various kinds of biomedical research in the world. However, it remains uncertain whether genomic alterations existing in primary tumor tissues are represented in cell lines and whether cell lines carry cell line-specific genomic alterations. This study was performed to answer these questions.
Array-based comparative genomic hybridization (CGH) was employed with 4030 bacterial artificial chromosomes (BACs) that cover the genome at 1.0 megabase resolution to analyze DNA copy number aberrations (DCNAs) in 35 primary breast tumors and 24 breast cancer cell lines. DCNAs were compared between these two groups. A tissue microdissection technique was applied to primary tumor tissues to reduce the contamination of samples by normal tissue components.
The average number of BAC clones with DCNAs was 1832 (45.3% of spotted clones) and 971 (24.9%) for cell lines and primary tumor tissues, respectively. Gains of 1q and 8q and losses of 8p, 11q, 16q and 17p were detected in >50% of primary cancer tissues. These aberrations were also frequently detected in cell lines. In addition to these alterations, the cell lines showed recurrent genomic alterations including gains of 5p14-15, 20q11 and 20q13 and losses of 4p13-p16, 18q12, 18q21, Xq21.1 and Xq26-q28 that were barely detected in tumor tissue specimens. These are considered to be cell line-specific DCNAs. The frequency of the HER2 amplification was high in both cell lines and tumor tissues, but it was statistically different between cell lines and primary tumors (P = 0.012); 41.3 ± 29.9% for the cell lines and 15.9 ± 18.6% for the tissue specimens.
Established cell lines carry cell lines-specific DCNAs together with recurrent aberrations detected in primary tumor tissues. It must therefore be emphasized that cell lines do not always represent the genotypes of parental tumor tissues.
Somatic cell genetic alterations are a hallmark of tumor development and progression. Although various technologies have been developed and utilized to identify genetic aberrations, identifying genetic translocations at the chromosomal level is still a challenging task. High density SNP microarrays are useful to measure DNA copy number variation (CNV) across the genome. Utilizing SNP array data of cancer cell lines and patient samples, we evaluated the CNV and copy number breakpoints for several known fusion genes implicated in tumorigenesis. This analysis demonstrated the potential utility of SNP array data for the prediction of genetic aberrations via translocations based on identifying copy number breakpoints within the target genes. Genome-wide analysis was also performed to identify genes harboring copy number breakpoints across 820 cancer cell lines. Candidate oncogenes were identified that are linked to potential translocations in specific cancer cell lines.
copy number variation; copy number breakpoint; SNP array; translocation
Genomic abnormalities leading to colorectal cancer (CRC) include somatic events causing copy number aberrations (CNAs) as well as copy neutral manifestations such as loss of heterozygosity (LOH) and uniparental disomy (UPD). We studied the causal effect of these events by analyzing high resolution cytogenetic microarray data of 15 tumor-normal paired samples. We detected 144 genes affected by CNAs. A subset of 91 genes are known to be CRC related yet high GISTIC scores indicate 24 genes on chromosomes 7, 8, 18 and 20 to be strongly relevant. Combining GISTIC ranking with functional analyses and degree of loss/gain we identify three genes in regions of significant loss (ATP8B1, NARS, and ATP5A1) and eight in regions of gain (CTCFL, SPO11, ZNF217, PLEKHA8, HOXA3, GPNMB, IGF2BP3 and PCAT1) as novel in their association with CRC. Pathway and target prediction analysis of CNA affected genes and microRNAs, respectively indicates TGF-β signaling pathway to be involved in causing CRC. Finally, LOH and UPD collectively affected nine cancer related genes. Transcription factor binding sites on regions of >35% copy number loss/gain influenced 16 CRC genes. Our analysis shows patient specific CRC manifestations at the genomic level and that these different events affect individual CRC patients differently.
The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable.
We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels.
Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based.
We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.
Treatment choices for cervical cancer are primarily based on clinical FIGO stage and the post-operative evaluation of prognostic parameters including tumor diameter, parametrial and lymph node involvement, vaso-invasion, infiltration depth, and histological type. The aim of this study was to evaluate genomic changes in bulky cervical tumors and their relation to clinical parameters, using single nucleotide polymorphism (SNP)-analysis.
Flow-sorted tumor cells and patient-matched normal cells were extracted from 81 bulky cervical tumors. DNA-index (DI) measurement and whole genome SNP-analysis were performed. Data were analyzed to detect copy number alterations (CNA) and allelic balance state: balanced, imbalanced or pure LOH, and their relation to clinical parameters.
The DI varied from 0.92–2.56. Pure LOH was found in ≥40% of samples on chromosome-arms 3p, 4p, 6p, 6q, and 11q, CN gains in >20% on 1q, 3q, 5p, 8q, and 20q, and losses on 2q, 3p, 4p, 11q, and 13q. Over 40% showed gain on 3q. The only significant differences were found between histological types (squamous, adeno and adenosquamous) in the lesser allele intensity ratio (LAIR) (p = 0.035) and in the CNA analysis (p = 0.011). More losses were found on chromosome-arm 2q (FDR = 0.004) in squamous tumors and more gains on 7p, 7q, and 9p in adenosquamous tumors (FDR = 0.006, FDR = 0.004, and FDR = 0.029).
Whole genome analysis of bulky cervical cancer shows widespread changes in allelic balance and CN. The overall genetic changes and CNA on specific chromosome-arms differed between histological types. No relation was found with the clinical parameters that currently dictate treatment choice.