PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1222749)

Clipboard (0)
None

Related Articles

1.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH 
PLoS Computational Biology  2007;3(6):e122.
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases.
Author Summary
As a consequence of problems during cell division, the number of copies of a gene in a chromosome can either increase or decrease. These copy-number alterations (CNAs) can play a crucial role in the emergence of complex multigenic diseases. For example, in cancer, amplification of oncogenes can drive tumor activation, and CNAs are associated with metastasis development and patient survival. Studies on the relationship between CNAs and disease have been recently fueled by the widespread use of array-based comparative genomic hybridization (aCGH), a technique with much finer resolution than previous experimental approaches. Detection of CNAs from these data depends on methods of analysis that do not impose biologically unrealistic assumptions and that provide direct answers to fundamental research questions. We have developed a statistical method, using a Bayesian approach, that returns estimates of the probabilities of CNAs from aCGH data, the most direct and valuable answer to the key biological question: “What is the probability that this gene/region has an altered copy number?” The output of the method can therefore be immediately used in different settings from clinical to basic research scenarios, and is applicable over a wide variety of aCGH technologies.
doi:10.1371/journal.pcbi.0030122
PMCID: PMC1894821  PMID: 17590078
2.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
doi:10.1038/msb.2011.17
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
3.  Virtual CGH: an integrative approach to predict genetic abnormalities from gene expression microarray data applied in lymphoma 
BMC Medical Genomics  2011;4:32.
Background
Comparative Genomic Hybridization (CGH) is a molecular approach for detecting DNA Copy Number Alterations (CNAs) in tumor, which are among the key causes of tumorigenesis. However in the post-genomic era, most studies in cancer biology have been focusing on Gene Expression Profiling (GEP) but not CGH, and as a result, an enormous amount of GEP data had been accumulated in public databases for a wide variety of tumor types. We exploited this resource of GEP data to define possible recurrent CNAs in tumor. In addition, the CNAs identified by GEP would be more functionally relevant CNAs in the disease pathogenesis since the functional effects of CNAs can be reflected by altered gene expression.
Methods
We proposed a novel computational approach, coined virtual CGH (vCGH), which employs hidden Markov models (HMMs) to predict DNA CNAs from their corresponding GEP data. vCGH was first trained on the paired GEP and CGH data generated from a sufficient number of tumor samples, and then applied to the GEP data of a new tumor sample to predict its CNAs.
Results
Using cross-validation on 190 Diffuse Large B-Cell Lymphomas (DLBCL), vCGH achieved 80% sensitivity, 90% specificity and 90% accuracy for CNA prediction. The majority of the recurrent regions defined by vCGH are concordant with the experimental CGH, including gains of 1q, 2p16-p14, 3q27-q29, 6p25-p21, 7, 11q, 12 and 18q21, and losses of 6q, 8p23-p21, 9p24-p21 and 17p13 in DLBCL. In addition, vCGH predicted some recurrent functional abnormalities which were not observed in CGH, including gains of 1p, 2q and 6q and losses of 1q, 6p and 8q. Among those novel loci, 1q, 6q and 8q were significantly associated with the clinical outcomes in the DLBCL patients (p < 0.05).
Conclusions
We developed a novel computational approach, vCGH, to predict genome-wide genetic abnormalities from GEP data in lymphomas. vCGH can be generally applied to other types of tumors and may significantly enhance the detection of functionally important genetic abnormalities in cancer research.
doi:10.1186/1755-8794-4-32
PMCID: PMC3086850  PMID: 21486456
4.  Determining Frequent Patterns of Copy Number Alterations in Cancer 
PLoS ONE  2010;5(8):e12028.
Cancer progression is often driven by an accumulation of genetic changes but also accompanied by increasing genomic instability. These processes lead to a complicated landscape of copy number alterations (CNAs) within individual tumors and great diversity across tumor samples. High resolution array-based comparative genomic hybridization (aCGH) is being used to profile CNAs of ever larger tumor collections, and better computational methods for processing these data sets and identifying potential driver CNAs are needed. Typical studies of aCGH data sets take a pipeline approach, starting with segmentation of profiles, calls of gains and losses, and finally determination of frequent CNAs across samples. A drawback of pipelines is that choices at each step may produce different results, and biases are propagated forward. We present a mathematically robust new method that exploits probe-level correlations in aCGH data to discover subsets of samples that display common CNAs. Our algorithm is related to recent work on maximum-margin clustering. It does not require pre-segmentation of the data and also provides grouping of recurrent CNAs into clusters. We tested our approach on a large cohort of glioblastoma aCGH samples from The Cancer Genome Atlas and recovered almost all CNAs reported in the initial study. We also found additional significant CNAs missed by the original analysis but supported by earlier studies, and we identified significant correlations between CNAs.
doi:10.1371/journal.pone.0012028
PMCID: PMC2920822  PMID: 20711339
5.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer informatics  2006;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
6.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer Informatics  2007;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
7.  Integrated study of copy number states and genotype calls using high-density SNP arrays 
Nucleic Acids Research  2009;37(16):5365-5377.
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
doi:10.1093/nar/gkp493
PMCID: PMC2935461  PMID: 19581427
8.  Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data 
Nucleic Acids Research  2008;36(2):e13.
Tumor formation is in part driven by DNA copy number alterations (CNAs), which can be measured using microarray-based Comparative Genomic Hybridization (aCGH). Multiexperiment analysis of aCGH data from tumors allows discovery of recurrent CNAs that are potentially causal to cancer development. Until now, multiexperiment aCGH data analysis has been dependent on discretization of measurement data to a gain, loss or no-change state. Valuable biological information is lost when a heterogeneous system such as a solid tumor is reduced to these states. We have developed a new approach which inputs nondiscretized aCGH data to identify regions that are significantly aberrant across an entire tumor set. Our method is based on kernel regression and accounts for the strength of a probe's signal, its local genomic environment and the signal distribution across multiple tumors. In an analysis of 89 human breast tumors, our method showed enrichment for known cancer genes in the detected regions and identified aberrations that are strongly associated with breast cancer subtypes and clinical parameters. Furthermore, we identified 18 recurrent aberrant regions in a new dataset of 19 p53-deficient mouse mammary tumors. These regions, combined with gene expression microarray data, point to known cancer genes and novel candidate cancer genes.
doi:10.1093/nar/gkm1143
PMCID: PMC2241875  PMID: 18187509
9.  Assessing karyotype precision by microarray-based comparative genomic hybridization in the myelodysplastic/myeloproliferative syndromes 
Background
Recent genome-wide microarray-based research investigations have revealed a high frequency of submicroscopic copy number alterations (CNAs) in the myelodysplastic syndromes (MDS), suggesting microarray-based comparative genomic hybridization (aCGH) has the potential to detect new clinically relevant genomic markers in a diagnostic laboratory.
Results
We performed an exploratory study on 30 cases of MDS, myeloproliferative neoplasia (MPN) or evolving acute myeloid leukemia (AML) (% bone marrow blasts ≤ 30%, range 0-30%, median, 8%) by aCGH, using a genome-wide bacterial artificial chromosome (BAC) microarray. The sample data were compared to corresponding cytogenetics, fluorescence in situ hybridization (FISH), and clinical-pathological findings. Previously unidentified imbalances, in particular those considered submicroscopic aberrations (< 10 Mb), were confirmed by FISH analysis. CNAs identified by aCGH were concordant with the cytogenetic/FISH results in 25/30 (83%) of the samples tested. aCGH revealed new CNAs in 14/30 (47%) patients, including 28 submicroscopic or hidden aberrations verified by FISH studies. Cryptic 344-kb RUNX1 deletions were found in three patients at time of AML transformation. Other hidden CNAs involved 3q26.2/EVI1, 5q22/APC, 5q32/TCERG1,12p13.1/EMP1, 12q21.3/KITLG, and 17q11.2/NF1. Gains of CCND2/12p13.32 were detected in two patients. aCGH failed to detect a balanced translocation (n = 1) and low-level clonality (n = 4) in five karyotypically aberrant samples, revealing clinically important assay limitations.
Conclusions
The detection of previously known and unknown genomic alterations suggests that aCGH has considerable promise for identification of both recurring microscopic and submicroscopic genomic imbalances that contribute to myeloid disease pathogenesis and progression. These findings suggest that development of higher-resolution microarray platforms could improve karyotyping in clinical practice.
doi:10.1186/1755-8166-3-23
PMCID: PMC3000833  PMID: 21078186
10.  Bivariate segmentation of SNP-array data for allele-specific copy number analysis in tumour samples 
BMC Bioinformatics  2013;14:84.
Background
SNP arrays output two signals that reflect the total genomic copy number (LRR) and the allelic ratio (BAF), which in combination allow the characterisation of allele-specific copy numbers (ASCNs). While methods based on hidden Markov models (HMMs) have been extended from array comparative genomic hybridisation (aCGH) to jointly handle the two signals, only one method based on change-point detection, ASCAT, performs bivariate segmentation.
Results
In the present work, we introduce a generic framework for bivariate segmentation of SNP array data for ASCN analysis. For the matter, we discuss the characteristics of the typically applied BAF transformation and how they affect segmentation, introduce concepts of multivariate time series analysis that are of concern in this field and discuss the appropriate formulation of the problem. The framework is implemented in a method named CnaStruct, the bivariate form of the structural change model (SCM), which has been successfully applied to transcriptome mapping and aCGH.
Conclusions
On a comprehensive synthetic dataset, we show that CnaStruct outperforms the segmentation of existing ASCN analysis methods. Furthermore, CnaStruct can be integrated into the workflows of several ASCN analysis tools in order to improve their performance, specially on tumour samples highly contaminated by normal cells.
doi:10.1186/1471-2105-14-84
PMCID: PMC3599505  PMID: 23497144
11.  CEQer: A Graphical Tool for Copy Number and Allelic Imbalance Detection from Whole-Exome Sequencing Data 
PLoS ONE  2013;8(10):e74825.
Copy number alterations (CNA) are common events occurring in leukaemias and solid tumors. Comparative Genome Hybridization (CGH) is actually the gold standard technique to analyze CNAs; however, CGH analysis requires dedicated instruments and is able to perform only low resolution Loss of Heterozygosity (LOH) analyses. Here we present CEQer (Comparative Exome Quantification analyzer), a new graphical, event-driven tool for CNA/allelic-imbalance (AI) coupled analysis of exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. This data is used to build mixed statistical/heuristic models allowing the identification of CNA/AI events. To test our tool, we initially used in silico generated data, then we performed whole-exome sequencing from 20 leukemic specimens and corresponding matched controls and we analyzed the results using CEQer. Taken globally, these analyses showed that the combined use of comparative digital exon quantification and LOH/AI allows generating very accurate CNA data. Therefore, we propose CEQer as an efficient, robust and user-friendly graphical tool for the identification of CNA/AI in the context of whole-exome sequencing data.
doi:10.1371/journal.pone.0074825
PMCID: PMC3790773  PMID: 24124457
12.  Genome-wide identification of significant aberrations in cancer genome 
BMC Genomics  2012;13:342.
Background
Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme.
Results
We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies.
Conclusions
Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open–source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at http://www.cbil.ece.vt.edu/software.htm.
doi:10.1186/1471-2164-13-342
PMCID: PMC3428679  PMID: 22839576
13.  Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer 
Breast Cancer Research : BCR  2011;13(6):R129.
Introduction
Human epidermal growth factor receptor 2 (HER2)-amplified breast cancer represents a clinically well-defined subgroup due to availability of targeted treatment. However, HER2-amplified tumors have been shown to be heterogeneous at the genomic level by genome-wide microarray analyses, pointing towards a need of further investigations for identification of recurrent copy number alterations and delineation of patterns of allelic imbalance.
Methods
High-density whole genome array-based comparative genomic hybridization (aCGH) and single nucleotide polymorphism (SNP) array data from 260 HER2-amplified breast tumors or cell lines, and 346 HER2-negative breast cancers with molecular subtype information were assembled from different repositories. Copy number alteration (CNA), loss-of-heterozygosity (LOH), copy number neutral allelic imbalance (CNN-AI), subclonal CNA and patterns of tumor DNA ploidy were analyzed using bioinformatical methods such as genomic identification of significant targets in cancer (GISTIC) and genome alteration print (GAP). The patterns of tumor ploidy were confirmed in 338 unrelated breast cancers analyzed by DNA flow cytometry with concurrent BAC aCGH and gene expression data.
Results
A core set of 36 genomic regions commonly affected by copy number gain or loss was identified by integrating results with a previous study, together comprising > 400 HER2-amplified tumors. While CNN-AI frequency appeared evenly distributed over chromosomes in HER2-amplified tumors, not targeting specific regions and often < 20% in frequency, the occurrence of LOH was strongly associated with regions of copy number loss. HER2-amplified and HER2-negative tumors stratified by molecular subtypes displayed different patterns of LOH and CNN-AI, with basal-like tumors showing highest frequencies followed by HER2-amplified and luminal B cases. Tumor aneuploidy was strongly associated with increasing levels of LOH, CNN-AI, CNAs and occurrence of subclonal copy number events, irrespective of subtype. Finally, SNP data from individual tumors indicated that genomic amplification in general appears as monoallelic, that is, it preferentially targets one parental chromosome in HER2-amplified tumors.
Conclusions
We have delineated the genomic landscape of CNAs, amplifications, LOH, and CNN-AI in HER2-amplified breast cancer, but also demonstrated a strong association between different types of genomic aberrations and tumor aneuploidy irrespective of molecular subtype.
doi:10.1186/bcr3075
PMCID: PMC3326571  PMID: 22169037
14.  Combined array-comparative genomic hybridization and single-nucleotide polymorphism-loss of heterozygosity analysis reveals complex genetic alterations in cervical cancer 
BMC Genomics  2007;8:53.
Background
Cervical carcinoma develops as a result of multiple genetic alterations. Different studies investigated genomic alterations in cervical cancer mainly by means of metaphase comparative genomic hybridization (mCGH) and microsatellite marker analysis for the detection of loss of heterozygosity (LOH). Currently, high throughput methods such as array comparative genomic hybridization (array CGH), single nucleotide polymorphism array (SNP array) and gene expression arrays are available to study genome-wide alterations. Integration of these 3 platforms allows detection of genomic alterations at high resolution and investigation of an association between copy number changes and expression.
Results
Genome-wide copy number and genotype analysis of 10 cervical cancer cell lines by array CGH and SNP array showed highly complex large-scale alterations. A comparison between array CGH and SNP array revealed that the overall concordance in detection of the same areas with copy number alterations (CNA) was above 90%. The use of SNP arrays demonstrated that about 75% of LOH events would not have been found by methods which screen for copy number changes, such as array CGH, since these were LOH events without CNA. Regions frequently targeted by CNA, as determined by array CGH, such as amplification of 5p and 20q, and loss of 8p were confirmed by fluorescent in situ hybridization (FISH). Genome-wide, we did not find a correlation between copy-number and gene expression. At chromosome arm 5p however, 22% of the genes were significantly upregulated in cell lines with amplifications as compared to cell lines without amplifications, as measured by gene expression arrays. For 3 genes, SKP2, ANKH and TRIO, expression differences were confirmed by quantitative real-time PCR (qRT-PCR).
Conclusion
This study showed that copy number data retrieved from either array CGH or SNP array are comparable and that the integration of genome-wide LOH, copy number and gene expression is useful for the identification of gene specific targets that could be relevant for the development and progression in cervical cancer.
doi:10.1186/1471-2164-8-53
PMCID: PMC1805756  PMID: 17311676
15.  Clinical and genetic characterization of basal cell carcinoma and breast cancer in a single patient 
SpringerPlus  2014;3:454.
Introduction
Multiple environmental and genetic factors are involved with the development of basal cell carcinomas (BCC), as well as with breast cancers. Tumor initiation and progression are often associated with genomic instability such as aneuploidies, and gains or losses of large chromosomal segments, known as copy number alterations (CNAs). CNAs have been successfully detected using the microarray comparative genomic hybridization technique (array-CGH) at high resolution. Data thus obtained are useful to identify specific genomic aberrations, to classify tumor stages, and to stratify subgroups of patients with different prognosis and clinical behaviors.
Case description
Clinical study of a 66-year-old white female identified two primary tumors, a ductal invasive grade-II carcinoma of the breast, and one nodular BCC. Germline and tumor genomic survey utilized the 180 K array-CGH analysis to investigate chromosomal alterations.
Discussion and evaluation
Several chromosomal anomalies were detected in the breast tumor genome, including focal ~422 Kb 13q13.3 microdeletion. In the BCC, amplification of a chromosome 6 spanning the centromere region between the cytobands 6p23 and 6q12 was identified. Several 6p amplified genes correspond to families of histone and human leukocyte antigen genes, whereas some of the CNAs found in the breast tumor are uncommon. No germline CNA was detected in the normal skin of the patient at this technical resolution.
Conclusion
CNAs found in the two different tumors of the patient constitute independent events arisen in the somatic lineage. Relevant genes to both carcinogenesis and progression are to be affected by these CNAs.
Electronic supplementary material
The online version of this article (doi:10.1186/2193-1801-3-454) contains supplementary material, which is available to authorized users.
doi:10.1186/2193-1801-3-454
PMCID: PMC4149681  PMID: 25184114
Invasive ductal breast carcinoma; Basal cell carcinoma; Array-CGH; Copy number alterations
16.  Medulloblastoma outcome is adversely associated with overexpression of EEF1D, RPL30, and RPS20 on the long arm of chromosome 8 
BMC Cancer  2006;6:223.
Background
Medulloblastoma is the most common malignant brain tumor of childhood. Improvements in clinical outcome require a better understanding of the genetic alterations to identify clinically significant biological factors and to stratify patients accordingly. In the present study, we applied cytogenetic characterization to guide the identification of biologically significant genes from gene expression microarray profiles of medulloblastoma.
Methods
We analyzed 71 primary medulloblastomas for chromosomal copy number aberrations (CNAs) using comparative genomic hybridization (CGH). Among 64 tumors that we previously analyzed by gene expression microarrays, 27 were included in our CGH series. We analyzed clinical outcome with respect to CNAs and microarray results. We filtered microarray data using specific CNAs to detect differentially expressed candidate genes associated with survival.
Results
The most frequent lesions detected in our series involved chromosome 17; loss of 16q, 10q, or 8p; and gain of 7q or 2p. Recurrent amplifications at 2p23-p24, 2q14, 7q34, and 12p13 were also observed. Gain of 8q is associated with worse overall survival (p = 0.0141), which is not entirely attributable to MYC amplification or overexpression. By applying CGH results to gene expression analysis of medulloblastoma, we identified three 8q-mapped genes that are associated with overall survival in the larger group of 64 patients (p < 0.05): eukaryotic translation elongation factor 1D (EEF1D), ribosomal protein L30 (RPL30), and ribosomal protein S20 (RPS20).
Conclusion
The complementary use of CGH and expression profiles can facilitate the identification of clinically significant candidate genes involved in medulloblastoma growth. We demonstrate that gain of 8q and expression levels of three 8q-mapped candidate genes (EEF1D, RPL30, RPS20) are associated with adverse outcome in medulloblastoma.
doi:10.1186/1471-2407-6-223
PMCID: PMC1578584  PMID: 16968546
17.  Cytogenetic analysis of myxoid liposarcoma and myxofibrosarcoma by array‐based comparative genomic hybridisation 
Journal of Clinical Pathology  2006;59(9):978-983.
Aim
To investigate overall chromosomal alterations using array‐based comparative genomic hybridisation (CGH) of myxoid liposarcomas (MLSs) and myxofibrosarcomas (MFSs).
Materials and methods
Genomic DNA extracted from fresh‐frozen tumour tissues was labelled with fluorochromes and then hybridised on to an array consisting of 1440 bacterial artificial chromosome clones representing regions throughout the entire human genome important in cytogenetics and oncology.
Results
DNA copy number aberrations (CNAs) were found in all the 8 MFSs, but no alterations were found in 7 (70%) of 10 MLSs. In MFSs, the most frequent CNAs were gains at 7p21.1–p22.1 and 12q15–q21.1 and a loss at 13q14.3–q34. The second most frequent CNAs were gains at 7q33–q35, 9q22.31–q22.33, 12p13.32–pter, 17q22–q23, Xp11.2 and Xq12 and losses at 10p13–p14, 10q25, 11p11–p14, 11q23.3–q25, 20p11–p12 and 21q22.13–q22.2, which were detected in 38% of the MFSs examined. In MLSs, only a few CNAs were found in two sarcomas with gains at 8p21.2–p23.3, 8q11.22–q12.2 and 8q23.1–q24.3, and in one with gains at 5p13.2–p14.3 and 5q11.2–5q35.2 and a loss at 21q22.2–qter.
Conclusions
MFS has more frequent and diverse CNAs than MLS, which reinforces the hypothesis that MFS is genetically different from MLS. Out‐array CGH analysis may also provide several entry points for the identification of candidate genes associated with oncogenesis and progression in MFS.
doi:10.1136/jcp.2005.034942
PMCID: PMC1860469  PMID: 16751306
18.  ADaCGH: A Parallelized Web-Based Application and R Package for the Analysis of aCGH Data 
PLoS ONE  2007;2(8):e737.
Background
Copy number alterations (CNAs) in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH). The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies.
Methodology/Principal Findings
ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM). For improved speed, we use parallel computing (via MPI). Additional information (GO terms, PubMed citations, KEGG and Reactome pathways) is available for individual genes, and for sets of genes with altered copy numbers.
Conclusions/Significance
ADaCGH represents a qualitative increase in the standards of these types of applications: a) all of the best performing algorithms are included, not just one or two; b) we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45×); c) we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms); d) we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e) all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.
doi:10.1371/journal.pone.0000737
PMCID: PMC1940324  PMID: 17710137
19.  Clinical Significance of Previously Cryptic Copy Number Alterations and Loss of Heterozygosity in Pediatric Acute Myeloid Leukemia and Myelodysplastic Syndrome Determined Using Combined Array Comparative Genomic Hybridization plus Single-Nucleotide Polymorphism Microarray Analyses 
Journal of Korean Medical Science  2014;29(7):926-933.
The combined array comparative genomic hybridization plus single-nucleotide polymorphism microarray (CGH+SNP microarray) platform can simultaneously detect copy number alterations (CNA) and copy-neutral loss of heterozygosity (LOH). Eighteen children with acute myeloid leukemia (AML) (n=15) or myelodysplastic syndrome (MDS) (n=3) were studied using CGH+SNP microarray to evaluate the clinical significance of submicroscopic chromosomal aberrations. CGH+SNP microarray revealed CNAs at 14 regions in 9 patients, while metaphase cytogenetic (MC) analysis detected CNAs in 11 regions in 8 patients. Using CGH+SNP microarray, LOHs>10 Mb involving terminal regions or the whole chromosome were detected in 3 of 18 patients (17%). CGH+SNP microarray revealed cryptic LOHs with or without CNAs in 3 of 5 patients with normal karyotypes. CGH+SNP microarray detected additional cryptic CNAs (n=2) and LOHs (n=5) in 6 of 13 patients with abnormal MC. In total, 9 patients demonstrated additional aberrations, including CNAs (n=3) and/or LOHs (n=8). Three of 15 patients with AML and terminal LOH>10 Mb demonstrated a significantly inferior relapse-free survival rate (P=0.041). This study demonstrates that CGH+SNP microarray can simultaneously detect previously cryptic CNAs and LOH, which may demonstrate prognostic implications.
Graphical Abstract
doi:10.3346/jkms.2014.29.7.926
PMCID: PMC4101780  PMID: 25045224
Leukemia, Myeloid, Acute; DNA Copy Number Variations; Loss of Heterozygosity; Comparative Genomic Hybridization; Single-Nucleotide Polymorphism Microarray
20.  Integrated analysis of copy number alteration and RNA expression profiles of cancer using a high-resolution whole-genome oligonucleotide array 
Experimental & Molecular Medicine  2009;41(7):462-470.
Recently, microarray-based comparative genomic hybridization (array-CGH) has emerged as a very efficient technology with higher resolution for the genome-wide identification of copy number alterations (CNA). Although CNAs are thought to affect gene expression, there is no platform currently available for the integrated CNA-expression analysis. To achieve high-resolution copy number analysis integrated with expression profiles, we established human 30k oligoarray-based genome-wide copy number analysis system and explored the applicability of this system for integrated genome and transcriptome analysis using MDA-MB-231 cell line. We compared the CNAs detected by the oligoarray with those detected by the 3k BAC array for validation. The oligoarray identified the single copy difference more accurately and sensitively than the BAC array. Seventeen CNAs detected by both platforms in MDA-MB-231 such as gains of 5p15.33-13.1, 8q11.22-8q21.13, 17p11.2, and losses of 1p32.3, 8p23.3-8p11.21, and 9p21 were consistently identified in previous studies on breast cancer. There were 122 other small CNAs (mean size 1.79 mb) that were detected by oligoarray only, not by BAC-array. We performed genomic qPCR targeting 7 CNA regions, detected by oligoarray only, and one non-CNA region to validate the oligoarray CNA detection. All qPCR results were consistent with the oligoarray-CGH results. When we explored the possibility of combined interpretation of both DNA copy number and RNA expression profiles, mean DNA copy number and RNA expression levels showed a significant correlation. In conclusion, this 30k oligoarray-CGH system can be a reasonable choice for analyzing whole genome CNAs and RNA expression profiles at a lower cost.
doi:10.3858/emm.2009.41.7.051
PMCID: PMC2721143  PMID: 19322034
cell line, tumor; gene dosage; gene expression profiling; oligonucleotide array sequence analysis
21.  A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array 
BMC Bioinformatics  2007;8:145.
Background
DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.
Results
We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step.
Conclusion
Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.
doi:10.1186/1471-2105-8-145
PMCID: PMC1868765  PMID: 17477871
22.  KC-SMARTR: An R package for detection of statistically significant aberrations in multi-experiment aCGH data 
BMC Research Notes  2010;3:298.
Background
Most approaches used to find recurrent or differential DNA Copy Number Alterations (CNA) in array Comparative Genomic Hybridization (aCGH) data from groups of tumour samples depend on the discretization of the aCGH data to gain, loss or no-change states. This causes loss of valuable biological information in tumour samples, which are frequently heterogeneous. We have previously developed an algorithm, KC-SMART, that bases its estimate of the magnitude of the CNA at a given genomic location on kernel convolution (Klijn et al., 2008). This accounts for the intensity of the probe signal, its local genomic environment and the signal distribution across multiple samples.
Results
Here we extend the approach to allow comparative analyses of two groups of samples and introduce the R implementation of these two approaches. The comparative module allows for a supervised analysis to be performed, to enable the identification of regions that are differentially aberrated between two user-defined classes.
We analyzed data from a series of B- and T-cell lymphomas and were able to retrieve all positive control regions (VDJ regions) in addition to a number of new regions. A t-test employing segmented data, that we implemented, was also able to locate all the positive control regions and a number of new regions but these regions were highly fragmented.
Conclusions
KC-SMARTR offers recurrent CNA and class specific CNA detection, at different genomic scales, in a single package without the need for additional segmentation. It is memory efficient and runs on a wide range of machines. Most importantly, it does not rely on data discretization and therefore maximally exploits the biological information in the aCGH data.
The program is freely available from the Bioconductor website http://www.bioconductor.org/ under the terms of the GNU General Public License.
doi:10.1186/1756-0500-3-298
PMCID: PMC2995794  PMID: 21070656
23.  Copy number alterations in small intestinal neuroendocrine tumors determined by array comparative genomic hybridization 
BMC Cancer  2013;13:505.
Background
Small intestinal neuroendocrine tumors (SI-NETs) are typically slow-growing tumors that have metastasized already at the time of diagnosis. The purpose of the present study was to further refine and define regions of recurrent copy number (CN) alterations (CNA) in SI-NETs.
Methods
Genome-wide CNAs was determined by applying array CGH (a-CGH) on SI-NETs including 18 primary tumors and 12 metastases. Quantitative PCR analysis (qPCR) was used to confirm CNAs detected by a-CGH as well as to detect CNAs in an extended panel of SI-NETs. Unsupervised hierarchical clustering was used to detect tumor groups with similar patterns of chromosomal alterations based on recurrent regions of CN loss or gain. The log rank test was used to calculate overall survival. Mann–Whitney U test or Fisher’s exact test were used to evaluate associations between tumor groups and recurrent CNAs or clinical parameters.
Results
The most frequent abnormality was loss of chromosome 18 observed in 70% of the cases. CN losses were also frequently found of chromosomes 11 (23%), 16 (20%), and 9 (20%), with regions of recurrent CN loss identified in 11q23.1-qter, 16q12.2-qter, 9pter-p13.2 and 9p13.1-11.2. Gains were most frequently detected in chromosomes 14 (43%), 20 (37%), 4 (27%), and 5 (23%) with recurrent regions of CN gain located to 14q11.2, 14q32.2-32.31, 20pter-p11.21, 20q11.1-11.21, 20q12-qter, 4 and 5. qPCR analysis confirmed most CNAs detected by a-CGH as well as revealed CNAs in an extended panel of SI-NETs. Unsupervised hierarchical clustering of recurrent regions of CNAs revealed two separate tumor groups and 5 chromosomal clusters. Loss of chromosomes 18, 16 and 11 and again of chromosome 20 were found in both tumor groups. Tumor group II was enriched for alterations in chromosome cluster-d, including gain of chromosomes 4, 5, 7, 14 and gain of 20 in chromosome cluster-b. Gain in 20pter-p11.21 was associated with short survival. Statistically significant differences were observed between primary tumors and metastases for loss of 16q and gain of 7.
Conclusion
Our results revealed recurrent CNAs in several candidate regions with a potential role in SI-NET development. Distinct genetic alterations and pathways are involved in tumorigenesis of SI-NETs.
doi:10.1186/1471-2407-13-505
PMCID: PMC3819709  PMID: 24165089
Small intestine; Neuroendocrine tumor; Carcinoid; Array CGH; Chromosome 18
24.  CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data 
Bioinformatics  2009;26(4):464-469.
Motivation: DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies.
Results: Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Availability: The R and C programs implementing our method are available at https://dsgweb.wustl.edu/qunyuan/software/cmds.
Contact: qunyuan@wustl.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp708
PMCID: PMC2852218  PMID: 20031968
25.  Array comparative genomic hybridisation (aCGH) analysis of premenopausal breast cancers from a nuclear fallout area and matched cases from Western New York 
British Journal of Cancer  2005;93(6):699-708.
High-resolution array comparative genomic hybridisation (aCGH) analysis of DNA copy number aberrations (CNAs) was performed on breast carcinomas in premenopausal women from Western New York (WNY) and from Gomel, Belarus, an area exposed to fallout from the 1986 Chernobyl nuclear accident. Genomic DNA was isolated from 47 frozen tumour specimens from 42 patients and hybridised to arrays spotted with more than 3000 BAC clones. In all, 20 samples were from WNY and 27 were from Belarus. In total, 34 samples were primary tumours and 13 were lymph node metastases, including five matched pairs from Gomel. The average number of total CNAs per sample was 76 (range 35–134). We identified 152 CNAs (92 gains and 60 losses) occurring in more than 10% of the samples. The most common amplifications included gains at 8q13.2 (49%), at 1p21.1 (36%), and at 8q24.21 (36%). The most common deletions were at 1p36.22 (26%), at 17p13.2 (26%), and at 8p23.3 (23%). Belarussian tumours had more amplifications and fewer deletions than WNY breast cancers. HER2/neu negativity and younger age were also associated with a higher number of gains and fewer losses. In the five paired samples, we observed more discordant than concordant DNA changes. Unsupervised hierarchical cluster analysis revealed two distinct groups of tumours: one comprised predominantly of Belarussian carcinomas and the other largely consisting of WNY cases. In total, 50 CNAs occurred significantly more commonly in one cohort vs the other, and these included some candidate signature amplifications in the breast cancers in women exposed to significant radiation. In conclusion, our high-density aCGH study has revealed a large number of genetic aberrations in individual premenopausal breast cancer specimens, some of which had not been reported before. We identified a distinct CNA profile for carcinomas from a nuclear fallout area, suggesting a possible molecular fingerprint of radiation-associated breast cancer.
doi:10.1038/sj.bjc.6602784
PMCID: PMC2361621  PMID: 16222315
amplification; array CGH; breast cancer; deletion; radiation

Results 1-25 (1222749)