Search tips
Search criteria

Results 1-25 (1246995)

Clipboard (0)

Related Articles

1.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH 
PLoS Computational Biology  2007;3(6):e122.
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases.
Author Summary
As a consequence of problems during cell division, the number of copies of a gene in a chromosome can either increase or decrease. These copy-number alterations (CNAs) can play a crucial role in the emergence of complex multigenic diseases. For example, in cancer, amplification of oncogenes can drive tumor activation, and CNAs are associated with metastasis development and patient survival. Studies on the relationship between CNAs and disease have been recently fueled by the widespread use of array-based comparative genomic hybridization (aCGH), a technique with much finer resolution than previous experimental approaches. Detection of CNAs from these data depends on methods of analysis that do not impose biologically unrealistic assumptions and that provide direct answers to fundamental research questions. We have developed a statistical method, using a Bayesian approach, that returns estimates of the probabilities of CNAs from aCGH data, the most direct and valuable answer to the key biological question: “What is the probability that this gene/region has an altered copy number?” The output of the method can therefore be immediately used in different settings from clinical to basic research scenarios, and is applicable over a wide variety of aCGH technologies.
PMCID: PMC1894821  PMID: 17590078
2.  Bayesian Disease Classification Using Copy Number Data 
Cancer Informatics  2014;13(Suppl 2):83-91.
DNA copy number variations (CNVs) have been shown to be associated with cancer development and progression. The detection of these CNVs has the potential to impact the basic knowledge and treatment of many types of cancers, and can play a role in the discovery and development of molecular-based personalized cancer therapies. One of the most common types of high-resolution chromosomal microarrays is array-based comparative genomic hybridization (aCGH) methods that assay DNA CNVs across the whole genomic landscape in a single experiment. In this article we propose methods to use aCGH profiles to predict disease states. We employ a Bayesian classification model and treat disease states as outcome, and aCGH profiles as covariates in order to identify significant regions of the genome associated with disease subclasses. We propose a principled two-stage method where we first make inferences on the underlying copy number states associated with the aCGH emissions based on hidden Markov model (HMM) formulations to account for serial dependencies in neighboring probes. Subsequently, we infer associations with disease outcomes, conditional on the copy number states, using Bayesian linear variable selection procedures. The selected probes and their effects are parameters that are useful for predicting the disease categories of any additional individuals on the basis of their aCGH profiles. Using simulated datasets, we investigate the method’s accuracy in detecting disease category. Our methodology is motivated by and applied to a breast cancer dataset consisting of aCGH profiles assayed on patients from multiple disease subtypes.
PMCID: PMC4196891  PMID: 25336897
breast cancer; classification; Bayesian network; hidden Markov model
3.  A Bayesian Analysis for Identifying DNA Copy Number Variations Using a Compound Poisson Process 
To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH) technique is often used for detecting DNA copy number variants (CNVs). Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker) positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.
PMCID: PMC3171362  PMID: 20976296
4.  A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data 
PLoS ONE  2011;6(10):e26975.
It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample.
Methodology and Principal Findings
We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR).
Conclusions and Significance
We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at:
PMCID: PMC3205051  PMID: 22073121
5.  A Statistical Change Point Model Approach for the Detection of DNA Copy Number Variations in Array CGH Data 
Array comparative genomic hybridization (aCGH) provides a high-resolution and high-throughput technique for screening of copy number variations (CNVs) within the entire genome. This technique, compared to the conventional CGH, significantly improves the identification of chromosomal abnormalities. However, due to the random noise inherited in the imaging and hybridization process, identifying statistically significant DNA copy number changes in aCGH data is challenging. We propose a novel approach that uses the mean and variance change point model (MVCM) to detect CNVs or breakpoints in aCGH data sets. We derive an approximate p-value for the test statistic and also give the estimate of the locus of the DNA copy number change. We carry out simulation studies to evaluate the accuracy of the estimate and the p-value formulation. These simulation results show that the approach is effective in identifying copy number changes. The approach is also tested on fibroblast cancer cell line data, breast tumor cell line data, and breast cancer cell line aCGH data sets that are publicly available. Changes that have not been identified by the circular binary segmentation (CBS) method but are biologically verified are detected by our approach on these cell lines with higher sensitivity and specificity than CBS.
PMCID: PMC4154476  PMID: 19875853
Statistical hypothesis testing; aCGH microarray data; gene expression; DNA copy numbers; CNVs
6.  Whole-Genome Array CGH Evaluation for Replacing Prenatal Karyotyping in Hong Kong 
PLoS ONE  2014;9(2):e87988.
To evaluate the effectiveness of whole-genome array comparative genomic hybridization (aCGH) in prenatal diagnosis in Hong Kong.
Array CGH was performed on 220 samples recruited prospectively as the first-tier test study. In addition 150 prenatal samples with abnormal fetal ultrasound findings found to have normal karyotypes were analyzed as a ‘further-test’ study using NimbleGen CGX-135K oligonucleotide arrays.
Array CGH findings were concordant with conventional cytogenetic results with the exception of one case of triploidy. It was found in the first-tier test study that aCGH detected 20% (44/220) clinically significant copy number variants (CNV), of which 21 were common aneuploidies and 23 had other chromosomal imbalances. There were 3.2% (7/220) samples with CNVs detected by aCGH but not by conventional cytogenetics. In the ‘further-test’ study, the additional diagnostic yield of detecting chromosome imbalance was 6% (9/150). The overall detection for CNVs of unclear clinical significance was 2.7% (10/370) with 0.9% found to be de novo. Eleven loci of common CNVs were found in the local population.
Whole-genome aCGH offered a higher resolution diagnostic capacity than conventional karyotyping for prenatal diagnosis either as a first-tier test or as a ‘further-test’ for pregnancies with fetal ultrasound anomalies. We propose replacing conventional cytogenetics with aCGH for all pregnancies undergoing invasive diagnostic procedures after excluding common aneuploidies and triploidies by quantitative fluorescent PCR. Conventional cytogenetics can be reserved for visualization of clinically significant CNVs.
PMCID: PMC3914896  PMID: 24505343
7.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization 
Genome Biology  2007;8(10):R228.
Datasets used for detecting copy number variation (CNV) are shown to be affected by a technical artifact. A novel CNV calling algorithm is presented which removes this artifact and identifies regions of CNV better than existing methods.
Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined.
We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses.
Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.
PMCID: PMC2246302  PMID: 17961237
8.  Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data 
Array-based comparative genomic hybridization (aCGH) is a high-resolution high-throughput technique for studying the genetic basis of cancer. The resulting data consists of log fluorescence ratios as a function of the genomic DNA location and provides a cytogenetic representation of the relative DNA copy number variation. Analysis of such data typically involves estimation of the underlying copy number state at each location and segmenting regions of DNA with similar copy number states. Most current methods proceed by modeling a single sample/array at a time, and thus fail to borrow strength across multiple samples to infer shared regions of copy number aberrations. We propose a hierarchical Bayesian random segmentation approach for modeling aCGH data that utilizes information across arrays from a common population to yield segments of shared copy number changes. These changes characterize the underlying population and allow us to compare different population aCGH profiles to assess which regions of the genome have differential alterations. Our method, referred to as BDSAcgh (Bayesian Detection of Shared Aberrations in aCGH), is based on a unified Bayesian hierarchical model that allows us to obtain probabilities of alteration states as well as probabilities of differential alteration that correspond to local false discovery rates. We evaluate the operating characteristics of our method via simulations and an application using a lung cancer aCGH data set.
PMCID: PMC3079218  PMID: 21512611
Bayesian methods; Comparative Genomic Hybridization; Copy number; Functional data analysis; Mixed Models; Mixture Models
9.  Accuracy of CNV Detection from GWAS Data 
PLoS ONE  2011;6(1):e14511.
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites—Birdsuite, Partek, HelixTree, and PennCNV-Affy—in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two “gold standards,” the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a “gold standard” for detection of CNVs remains to be established.
PMCID: PMC3020939  PMID: 21249187
10.  A hidden Markov model-based algorithm for identifying tumour subtype using array CGH data 
BMC Genomics  2011;12(Suppl 5):S10.
The recent advancement in array CGH (aCGH) research has significantly improved tumor identification using DNA copy number data. A number of unsupervised learning methods have been proposed for clustering aCGH samples. Two of the major challenges for developing aCGH sample clustering are the high spatial correlation between aCGH markers and the low computing efficiency. A mixture hidden Markov model based algorithm was developed to address these two challenges.
The hidden Markov model (HMM) was used to model the spatial correlation between aCGH markers. A fast clustering algorithm was implemented and real data analysis on glioma aCGH data has shown that it converges to the optimal cluster rapidly and the computation time is proportional to the sample size. Simulation results showed that this HMM based clustering (HMMC) method has a substantially lower error rate than NMF clustering. The HMMC results for glioma data were significantly associated with clinical outcomes.
We have developed a fast clustering algorithm to identify tumor subtypes based on DNA copy number aberrations. The performance of the proposed HMMC method has been evaluated using both simulated and real aCGH data. The software for HMMC in both R and C++ is available in ND INBRE website
PMCID: PMC3287492  PMID: 22369459
11.  A probe-density-based analysis method for array CGH data: simulation, normalization and centralization 
Bioinformatics  2008;24(16):1749-1756.
Motivation: Genomic instability is one of the fundamental factors in tumorigenesis and tumor progression. Many studies have shown that copy-number abnormalities at the DNA level are important in the pathogenesis of cancer. Array comparative genomic hybridization (aCGH), developed based on expression microarray technology, can reveal the chromosomal aberrations in segmental copies at a high resolution. However, due to the nature of aCGH, many standard expression data processing tools, such as data normalization, often fail to yield satisfactory results.
Results: We demonstrated a novel aCGH normalization algorithm, which provides an accurate aCGH data normalization by utilizing the dependency of neighboring probe measurements in aCGH experiments. To facilitate the study, we have developed a hidden Markov model (HMM) to simulate a series of aCGH experiments with random DNA copy number alterations that are used to validate the performance of our normalization. In addition, we applied the proposed normalization algorithm to an aCGH study of lung cancer cell lines. By using the proposed algorithm, data quality and the reliability of experimental results are significantly improved, and the distinct patterns of DNA copy number alternations are observed among those lung cancer cell lines.
Supplementary information: Source codes and.gures may be found at
PMCID: PMC2732214  PMID: 18603568
12.  Identification of copy number variants from exome sequence data 
BMC Genomics  2014;15(1):661.
With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array.
We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate.
In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-661) contains supplementary material, which is available to authorized users.
PMCID: PMC4132917  PMID: 25102989
Exome; CNV prediction; Custom aCGH
13.  A robust penalized method for the analysis of noisy DNA copy number data 
BMC Genomics  2010;11:517.
Deletions and amplifications of the human genomic DNA copy number are the causes of numerous diseases, such as, various forms of cancer. Therefore, the detection of DNA copy number variations (CNV) is important in understanding the genetic basis of many diseases. Various techniques and platforms have been developed for genome-wide analysis of DNA copy number, such as, array-based comparative genomic hybridization (aCGH) and high-resolution mapping with high-density tiling oligonucleotide arrays. Since complicated biological and experimental processes are often associated with these platforms, data can be potentially contaminated by outliers.
We propose a penalized LAD regression model with the adaptive fused lasso penalty for detecting CNV. This method contains robust properties and incorporates both the spatial dependence and sparsity of CNV into the analysis. Our simulation studies and real data analysis indicate that the proposed method can correctly detect the numbers and locations of the true breakpoints while appropriately controlling the false positives.
The proposed method has three advantages for detecting CNV change points: it contains robustness properties; incorporates both spatial dependence and sparsity; and estimates the true values at each marker accurately.
PMCID: PMC3247090  PMID: 20868505
14.  Clinical use of array comparative genomic hybridization (aCGH) for prenatal diagnosis in 300 cases† 
Prenatal diagnosis  2009;29(1):29-39.
To evaluate the use of array comparative genomic hybridization (aCGH) for prenatal diagnosis, including assessment of variants of uncertain significance, and the ability to detect abnormalities not detected by karyotype, and vice versa.
Women undergoing amniocentesis or chorionic villus sampling (CVS) for karyotype were offered aCGH analysis using a targeted microarray. Parental samples were obtained concurrently to exclude maternal cell contamination and determine if copy number variants (CNVs) were de novo, or inherited prior to issuing a report.
We analyzed 300 samples, most were amniotic fluid (82%) and CVS (17%). The most common indications were advanced maternal age (N = 123) and abnormal ultrasound findings (N = 84). We detected 58 CNVs (19.3%). Of these, 40 (13.3%) were interpreted as likely benign, 15 (5.0%) were of defined pathological significance, while 3 (1.0%) were of uncertain clinical significance. For seven (~2.3% or 1/43), aCGH contributed important new information. For two of these (1% or ~1/150), the abnormality would not have been detected without aCGH analysis.
Although aCGH-detected benign inherited variants in 13.3% of cases, these did not present major counseling difficulties, and the procedure is an improved diagnostic tool for prenatal detection of chromosomal abnormalities.
PMCID: PMC3665952  PMID: 19012303
aCGH; chromosomal abnormality; chromosomal microarray analysis; prenatal; copy number variants; CVS; amniotic fluid
15.  Microarray Comparative Genomic Hybridisation Analysis Incorporating Genomic Organisation, and Application to Enterobacterial Plant Pathogens 
PLoS Computational Biology  2009;5(8):e1000473.
Microarray comparative genomic hybridisation (aCGH) provides an estimate of the relative abundance of genomic DNA (gDNA) taken from comparator and reference organisms by hybridisation to a microarray containing probes that represent sequences from the reference organism. The experimental method is used in a number of biological applications, including the detection of human chromosomal aberrations, and in comparative genomic analysis of bacterial strains, but optimisation of the analysis is desirable in each problem domain.
We present a method for analysis of bacterial aCGH data that encodes spatial information from the reference genome in a hidden Markov model. This technique is the first such method to be validated in comparisons of sequenced bacteria that diverge at the strain and at the genus level: Pectobacterium atrosepticum SCRI1043 (Pba1043) and Dickeya dadantii 3937 (Dda3937); and Lactococcus lactis subsp. lactis IL1403 and L. lactis subsp. cremoris MG1363. In all cases our method is found to outperform common and widely used aCGH analysis methods that do not incorporate spatial information. This analysis is applied to comparisons between commercially important plant pathogenic soft-rotting enterobacteria (SRE) Pba1043, P. atrosepticum SCRI1039, P. carotovorum 193, and Dda3937.
Our analysis indicates that it should not be assumed that hybridisation strength is a reliable proxy for sequence identity in aCGH experiments, and robustly extends the applicability of aCGH to bacterial comparisons at the genus level. Our results in the SRE further provide evidence for a dynamic, plastic ‘accessory’ genome, revealing major genomic islands encoding gene products that provide insight into, and may play a direct role in determining, variation amongst the SRE in terms of their environmental survival, host range and aetiology, such as phytotoxin synthesis, multidrug resistance, and nitrogen fixation.
Author Summary
We describe the first use of a method for the analysis of bacterial microarray comparative genomic hybridisation (aCGH) that includes information about the spatial organisation of genes in the reference bacterium. We demonstrate that using this information improves predictive performance over standard bacterial aCGH methods in discriminating between genes from the reference organism that either do or do not have putative orthologues in the comparator organism. Our approach produces good results on more distantly related bacteria than can successfully be analysed by the standard methods. We apply our analysis to comparisons between four commercially-significant plant pathogenic bacteria, and identify several regions of the genome that are likely to contribute to their ability to cause disease, and to proliferate in the environment, generating hypotheses for future experiments.
PMCID: PMC2718846  PMID: 19696881
16.  Detection of divergent genes in microbial aCGH experiments 
BMC Bioinformatics  2006;7:181.
Array-based comparative genome hybridization (aCGH) is a tool for rapid comparison of genomes from different bacterial strains. The purpose of such analysis is to detect highly divergent or absent genes in a sample strain compared to an index strain. Development of methods for analyzing aCGH data has primarily focused on copy number abberations in cancer research. In microbial aCGH analyses, genes are typically ranked by log-ratios, and classification into divergent or present is done by choosing a cutoff log-ratio, either manually or by statistics calculated from the log-ratio distribution. As experimental settings vary considerably, it is not possible to develop a classical discriminant or statistical learning approach.
We introduce a more efficient method for analyzing microbial aCGH data using a finite mixture model and a data rotation scheme. Using the average posterior probabilities from the model fitted to log-ratios before and after rotation, we get a score for each gene, and demonstrate its advantages for ranking and detecting divergent genes with enlarged specificity and sensitivity.
The procedure is tested and compared to other approaches on simulated data sets, as well as on four experimental validation data sets for aCGH analysis on fully sequenced strains of Staphylococcus aureus and Streptococcus pneumoniae.
When tested on simulated data as well as on four different experimental validation data sets from experiments with only fully sequenced strains, our procedure out-competes the standard procedures of using a simple log-ratio cutoff for classification into present and divergent genes.
PMCID: PMC1563484  PMID: 16573812
17.  Large scale copy number variation (CNV) at 14q12 is associated with the presence of genomic abnormalities in neoplasia 
BMC Genomics  2006;7:138.
Advances made in the area of microarray comparative genomic hybridization (aCGH) have enabled the interrogation of the entire genome at a previously unattainable resolution. This has lead to the discovery of a novel class of alternative entities called large-scale copy number variations (CNVs). These CNVs are often found in regions of closely linked sequence homology called duplicons that are thought to facilitate genomic rearrangements in some classes of neoplasia. Recently, it was proposed that duplicons located near the recurrent translocation break points on chromosomes 9 and 22 in chronic myeloid leukemia (CML) may facilitate this tumor-specific translocation. Furthermore, ~15–20% of CML patients also carry a microdeletion on the derivative 9 chromosome (der(9)) and these patients have a poor prognosis. It has been hypothesised that der(9) deletion patients have increased levels of chromosomal instability.
In this study aCGH was performed and identified a CNV (RP11-125A5, hereafter called CNV14q12) that was present as a genomic gain or loss in 10% of control DNA samples derived from cytogenetically normal individuals. CNV14q12 was the same clone identified by Iafrate et al. as a CNV. Real-time polymerase chain reaction (Q-PCR) was used to determine the relative frequency of this CNV in DNA from a series of 16 CML patients (both with and without a der(9) deletion) together with DNA derived from 36 paediatric solid tumors in comparison to the incidence of CNV in control DNA. CNV14q12 was present in ~50% of both tumor and CML DNA, but was found in 72% of CML bearing a der(9) microdeletion. Chi square analysis found a statistically significant difference (p ≤ 0.001) between the incidence of this CNV in cancer and normal DNA and a slightly increased incidence in CML with deletions in comparison to those CML without a detectable deletion.
The increased incidence of CNV14q12 in tumor samples suggests that either acquired or inherited genomic variation of this new class of variation may be associated with onset or progression of neoplasia.
PMCID: PMC1550726  PMID: 16756668
18.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer Informatics  2007;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
19.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer informatics  2006;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
20.  Comparison of chromosomal and array-based comparative genomic hybridization for the detection of genomic imbalances in primary prostate carcinomas 
Molecular Cancer  2006;5:33.
In order to gain new insights into the molecular mechanisms involved in prostate cancer, we performed array-based comparative genomic hybridization (aCGH) on a series of 46 primary prostate carcinomas using a 1 Mbp whole-genome coverage platform. As chromosomal comparative genomic hybridization (cCGH) data was available for these samples, we compared the sensitivity and overall concordance of the two methodologies, and used the combined information to infer the best of three different aCGH scoring approaches.
Our data demonstrate that the reliability of aCGH in the analysis of primary prostate carcinomas depends to some extent on the scoring approach used, with the breakpoint estimation method being the most sensitive and reliable. The pattern of copy number changes detected by aCGH was concordant with that of cCGH, but the higher resolution technique detected 2.7 times more aberrations and 15.2% more carcinomas with genomic imbalances. We additionally show that several aberrations were consistently overlooked using cCGH, such as small deletions at 5q, 6q, 12p, and 17p. The latter were validated by fluorescence in situ hybridization targeting TP53, although only one carcinoma harbored a point mutation in this gene. Strikingly, homozygous deletions at 10q23.31, encompassing the PTEN locus, were seen in 58% of the cases with 10q loss.
We conclude that aCGH can significantly improve the detection of genomic aberrations in cancer cells as compared to previously established whole-genome methodologies, although contamination with normal cells may influence the sensitivity and specificity of some scoring approaches. Our work delineated recurrent copy number changes and revealed novel amplified loci and frequent homozygous deletions in primary prostate carcinomas, which may guide future work aimed at identifying the relevant target genes. In particular, biallelic loss seems to be a frequent mechanism of inactivation of the PTEN gene in prostate carcinogenesis.
PMCID: PMC1570364  PMID: 16952311
Array comparative genomic hybridization (aCGH) allows identification of copy number alterations across genomes. The key computational challenge in analyzing copy number variations (CNVs) using aCGH data or other similar data generated by a variety of array technologies is the detection of segment boundaries of copy number changes and inference of the copy number state for each segment. We have developed a novel statistical model based on the framework of conditional random fields (CRFs) that can effectively combine data smoothing, segmentation and copy number state decoding into one unified framework. Our approach (termed CRF-CNV) provides great flexibilities in defining meaningful feature functions. Therefore, it can effectively integrate local spatial information of arbitrary sizes into the model. For model parameter estimations, we have adopted the conjugate gradient (CG) method for likelihood optimization and developed efficient forward/backward algorithms within the CG framework. The method is evaluated using real data with known copy numbers as well as simulated data with realistic assumptions, and compared with two popular publicly available programs. Experimental results have demonstrated that CRF-CNV outperforms a Bayesian Hidden Markov Model-based approach on both datasets in terms of copy number assignments. Comparing to a non-parametric approach, CRF-CNV has achieved much greater precision while maintaining the same level of recall on the real data, and their performance on the simulated data is comparable.
PMCID: PMC3326659  PMID: 20401947
Array comparative genomic hybridization; copy number variations; conditional random fields
22.  Application of Array CGH on Archival Formalin-Fixed Paraffin-Embedded Tissues including small numbers of microdissected cells 
Array-based comparative genomic hybridisation (aCGH) has diverse applications in cancer gene discovery and translational research. Currently, aCGH is performed primarily using high molecular weight DNA samples and its application to formalin-fixed and paraffin-embedded (FFPE) tissues remains to be established. To explore how aCGH can be reliably applied to archival FFPE tissues and whether it is possible to apply aCGH to small numbers of cells microdissected from FFPE tissue sections, we have systematically performed aCGH on 15 pairs of matched frozen and FFPE glioblastoma tissues using a well established in-house human 1Mb BAC/PAC genomic array. By spiking glioblastoma DNA with normal DNA, we demonstrated that at least 70% of tumour DNA was required for reliable aCGH analysis. Using aCGH data from frozen tissue as a reference, it was found that only FFPE glioblastoma tissues that supported PCR amplification of >300bp DNA fragment provided high quality, reproducible aCGH data. The presence of necrosis in a tissue specimen had an adverse effect on the quality of aCGH, while fixation in formalin for up to 96 hours of fresh tissue did not appear to affect the quality of the result. As little as 10-20ng DNA from frozen or FFPE tissues could be readily used for aCGH analysis following whole genome amplification. Furthermore, as few as 2000 microdissected cells from haematoxylin stained slides of archival FFPE tissues could be successfully used for aCGH investigations when whole genome amplification was used. By careful assessment of DNA integrity and review of histology, to exclude necrosis and select specimens with a high proportion of tumour cells, it is feasible to pre-select archival FFPE tissues adequate for aCGH analysis. With the help of microdissection and whole genome amplification, it is also possible to apply aCGH to histologically defined lesions, such as carcinoma in situ.
PMCID: PMC2815849  PMID: 16751780
array CGH; archival fixed tissue; microdissection; whole genome amplification; glioblastoma
23.  A Genome-Wide Analysis of Array-Based Comparative Genomic Hybridization (CGH) Data to Detect Intra-Species Variations and Evolutionary Relationships 
PLoS ONE  2009;4(11):e7978.
Array-based comparative genomics hybridization (aCGH) has gained prevalence as an effective technique for measuring structural variations in the genome. Copy-number variations (CNVs) form a large source of genomic structural variation, but it is not known whether phenotypic differences between intra-species groups, such as divergent human populations, or breeds of a domestic animal, can be attributed to CNVs. Several computational methods have been proposed to improve the detection of CNVs from array CGH data, but few population studies have used CGH data for identification of intra-species differences. In this paper we propose a novel method of genome-wide comparison and classification using CGH data that condenses whole genome information, aimed at quantification of intra-species variations and discovery of shared ancestry. Our strategy included smoothing CGH data using an appropriate denoising algorithm, extracting features via wavelets, quantifying the information via wavelet power spectrum and hierarchical clustering of the resultant profile. To evaluate the classification efficiency of our method, we used simulated data sets. We applied it to aCGH data from human and bovine individuals and showed that it successfully detects existing intra-specific variations with additional evolutionary implications.
PMCID: PMC2777320  PMID: 19956659
24.  Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms 
PLoS ONE  2011;6(11):e27859.
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.
PMCID: PMC3227574  PMID: 22140474
25.  CoNVEX: copy number variation estimation in exome sequencing data using HMM 
BMC Bioinformatics  2013;14(Suppl 2):S2.
One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequenicng (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored.
We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM).
HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%.
PMCID: PMC3549847  PMID: 23368785
CNV detection; Cancer Genome; Targeted resequencing; Whole exome sequencing; Hidden Markov Models; Discrete Wavelet Transform

Results 1-25 (1246995)