PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1259133)

Clipboard (0)
None

Related Articles

1.  Computational Methods for the Analysis of Array Comparative Genomic Hybridization 
Cancer Informatics  2007;2:48-58.
Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development.
PMCID: PMC2067254  PMID: 17992253
array CGH; microarray; cancer genome; software; bioinformatics; alteration detection
2.  Combined array-comparative genomic hybridization and single-nucleotide polymorphism-loss of heterozygosity analysis reveals complex genetic alterations in cervical cancer 
BMC Genomics  2007;8:53.
Background
Cervical carcinoma develops as a result of multiple genetic alterations. Different studies investigated genomic alterations in cervical cancer mainly by means of metaphase comparative genomic hybridization (mCGH) and microsatellite marker analysis for the detection of loss of heterozygosity (LOH). Currently, high throughput methods such as array comparative genomic hybridization (array CGH), single nucleotide polymorphism array (SNP array) and gene expression arrays are available to study genome-wide alterations. Integration of these 3 platforms allows detection of genomic alterations at high resolution and investigation of an association between copy number changes and expression.
Results
Genome-wide copy number and genotype analysis of 10 cervical cancer cell lines by array CGH and SNP array showed highly complex large-scale alterations. A comparison between array CGH and SNP array revealed that the overall concordance in detection of the same areas with copy number alterations (CNA) was above 90%. The use of SNP arrays demonstrated that about 75% of LOH events would not have been found by methods which screen for copy number changes, such as array CGH, since these were LOH events without CNA. Regions frequently targeted by CNA, as determined by array CGH, such as amplification of 5p and 20q, and loss of 8p were confirmed by fluorescent in situ hybridization (FISH). Genome-wide, we did not find a correlation between copy-number and gene expression. At chromosome arm 5p however, 22% of the genes were significantly upregulated in cell lines with amplifications as compared to cell lines without amplifications, as measured by gene expression arrays. For 3 genes, SKP2, ANKH and TRIO, expression differences were confirmed by quantitative real-time PCR (qRT-PCR).
Conclusion
This study showed that copy number data retrieved from either array CGH or SNP array are comparable and that the integration of genome-wide LOH, copy number and gene expression is useful for the identification of gene specific targets that could be relevant for the development and progression in cervical cancer.
doi:10.1186/1471-2164-8-53
PMCID: PMC1805756  PMID: 17311676
3.  SeeGH – A software tool for visualization of whole genome array comparative genomic hybridization data 
BMC Bioinformatics  2004;5:13.
Background
Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles.
Results
We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation.
Conclusions
SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.
doi:10.1186/1471-2105-5-13
PMCID: PMC373529  PMID: 15040819
Array comparitve genomic hybridization; aCGH
4.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer informatics  2006;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
5.  aCGHViewer: A Generic Visualization Tool For aCGH data 
Cancer Informatics  2007;2:36-43.
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
PMCID: PMC1847423  PMID: 17404607
array-CGH; CNA; gene expression; visualization
6.  Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH 
PLoS Computational Biology  2007;3(6):e122.
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases.
Author Summary
As a consequence of problems during cell division, the number of copies of a gene in a chromosome can either increase or decrease. These copy-number alterations (CNAs) can play a crucial role in the emergence of complex multigenic diseases. For example, in cancer, amplification of oncogenes can drive tumor activation, and CNAs are associated with metastasis development and patient survival. Studies on the relationship between CNAs and disease have been recently fueled by the widespread use of array-based comparative genomic hybridization (aCGH), a technique with much finer resolution than previous experimental approaches. Detection of CNAs from these data depends on methods of analysis that do not impose biologically unrealistic assumptions and that provide direct answers to fundamental research questions. We have developed a statistical method, using a Bayesian approach, that returns estimates of the probabilities of CNAs from aCGH data, the most direct and valuable answer to the key biological question: “What is the probability that this gene/region has an altered copy number?” The output of the method can therefore be immediately used in different settings from clinical to basic research scenarios, and is applicable over a wide variety of aCGH technologies.
doi:10.1371/journal.pcbi.0030122
PMCID: PMC1894821  PMID: 17590078
7.  arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays 
BMC Bioinformatics  2005;6:124.
Background
The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH). One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment.
Results
We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment) supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser.
Conclusion
ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at .
doi:10.1186/1471-2105-6-124
PMCID: PMC1173083  PMID: 15910681
8.  CAPweb: a bioinformatics CGH array Analysis Platform 
Nucleic Acids Research  2006;34(Web Server issue):W477-W481.
Assessing variations in DNA copy number is crucial for understanding constitutional or somatic diseases, particularly cancers. The recently developed array-CGH (comparative genomic hybridization) technology allows this to be investigated at the genomic level. We report the availability of a web tool for analysing array-CGH data. CAPweb (CGH array Analysis Platform on the Web) is intended as a user-friendly tool enabling biologists to completely analyse CGH arrays from the raw data to the visualization and biological interpretation. The user typically performs the following bioinformatics steps of a CGH array project within CAPweb: the secure upload of the results of CGH array image analysis and of the array annotation (genomic position of the probes); first level analysis of each array, including automatic normalization of the data (for correcting experimental biases), breakpoint detection and status assignment (gain, loss or normal); validation or deletion of the analysis based on a summary report and quality criteria; visualization and biological analysis of the genomic profiles and results through a user-friendly interface. CAPweb is accessible at .
doi:10.1093/nar/gkl215
PMCID: PMC1538852  PMID: 16845053
9.  SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes 
BMC Genomics  2006;7:324.
Background
The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes.
Results
We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types.
Conclusion
In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA) of cancer genomes, can be accessed at .
doi:10.1186/1471-2164-7-324
PMCID: PMC1764892  PMID: 17192189
10.  Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana 
PLoS Computational Biology  2012;8(1):e1002286.
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM).
Author Summary
Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
doi:10.1371/journal.pcbi.1002286
PMCID: PMC3257270  PMID: 22253580
11.  Evaluation of copy number variation detection for a SNP array platform 
BMC Bioinformatics  2014;15:50.
Background
Copy Number Variations (CNVs) are usually inferred from Single Nucleotide Polymorphism (SNP) arrays by use of some software packages based on given algorithms. However, there is no clear understanding of the performance of these software packages; it is therefore difficult to select one or several software packages for CNV detection based on the SNP array platform.
We selected four publicly available software packages designed for CNV calling from an Affymetrix SNP array, including Birdsuite, dChip, Genotyping Console (GTC) and PennCNV. The publicly available dataset generated by Array-based Comparative Genomic Hybridization (CGH), with a resolution of 24 million probes per sample, was considered to be the “gold standard”. Compared with the CGH-based dataset, the success rate, average stability rate, sensitivity, consistence and reproducibility of these four software packages were assessed compared with the “gold standard”. Specially, we also compared the efficiency of detecting CNVs simultaneously by two, three and all of the software packages with that by a single software package.
Results
Simply from the quantity of the detected CNVs, Birdsuite detected the most while GTC detected the least. We found that Birdsuite and dChip had obvious detecting bias. And GTC seemed to be inferior because of the least amount of CNVs it detected. Thereafter we investigated the detection consistency produced by one certain software package and the rest three software suits. We found that the consistency of dChip was the lowest while GTC was the highest. Compared with the CNVs detecting result of CGH, in the matching group, GTC called the most matching CNVs, PennCNV-Affy ranked second. In the non-overlapping group, GTC called the least CNVs. With regards to the reproducibility of CNV calling, larger CNVs were usually replicated better. PennCNV-Affy shows the best consistency while Birdsuite shows the poorest.
Conclusion
We found that PennCNV outperformed the other three packages in the sensitivity and specificity of CNV calling. Obviously, each calling method had its own limitations and advantages for different data analysis. Therefore, the optimized calling methods might be identified using multiple algorithms to evaluate the concordance and discordance of SNP array-based CNV calling.
doi:10.1186/1471-2105-15-50
PMCID: PMC4015297  PMID: 24555668
CNV; CGH; Evaluation; Comparison; Performance test; Reproducibility test; Success rate; Birdsuite; dChip; GTC; PennCNV
12.  CGHPRO – A comprehensive data analysis tool for array CGH 
BMC Bioinformatics  2005;6:85.
Background
Array CGH (Comparative Genomic Hybridisation) is a molecular cytogenetic technique for the genome wide detection of chromosomal imbalances. It is based on the co-hybridisation of differentially labelled test and reference DNA onto arrays of genomic BAC clones, cDNAs or oligonucleotides, and after correction for various intervening variables, loss or gain in the test DNA can be indicated from spots showing aberrant signal intensity ratios.
Now that this technique is no longer confined to highly specialized laboratories and is entering the realm of clinical application, there is a need for a user-friendly software package that facilitates estimates of DNA dosage from raw signal intensities obtained by array CGH experiments, and which does not depend on a sophisticated computational environment.
Results
We have developed a user-friendly and versatile tool for the normalization, visualization, breakpoint detection and comparative analysis of array-CGH data. CGHPRO is a stand-alone JAVA application that guides the user through the whole process of data analysis. The import option for image analysis data covers several data formats, but users can also customize their own data formats. Several graphical representation tools assist in the selection of the appropriate normalization method. Intensity ratios of each clone can be plotted in a size-dependent manner along the chromosome ideograms. The interactive graphical interface offers the chance to explore the characteristics of each clone, such as the involvement of the clones sequence in segmental duplications. Circular Binary Segmentation and unsupervised Hidden Markov Model algorithms facilitate objective detection of chromosomal breakpoints. The storage of all essential data in a back-end database allows the simultaneously comparative analysis of different cases. The various display options facilitate also the definition of shortest regions of overlap and simplify the identification of odd clones.
Conclusion
CGHPRO is a comprehensive and easy-to-use data analysis tool for array CGH. Since all of its features are available offline, CGHPRO may be especially suitable in situations where protection of sensitive patient data is an issue. It is distributed under GNU GPL licence and runs on Linux and Windows.
doi:10.1186/1471-2105-6-85
PMCID: PMC1274268  PMID: 15807904
13.  SnoopCGH: software for visualizing comparative genomic hybridization data 
Bioinformatics  2009;25(20):2732-2733.
Summary: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data.
Availability and implementation: SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/
Contact: jg10@sanger.ac.uk; tc5@sanger.ac.uk
doi:10.1093/bioinformatics/btp488
PMCID: PMC2759554  PMID: 19687029
14.  The use of ultra-dense array CGH analysis for the discovery of micro-copy number alterations and gene fusions in the cancer genome 
BMC Medical Genomics  2011;4:16.
Background
Molecular alterations critical to development of cancer include mutations, copy number alterations (amplifications and deletions) as well as genomic rearrangements resulting in gene fusions. Massively parallel next generation sequencing, which enables the discovery of such changes, uses considerable quantities of genomic DNA (> 5 ug), a serious limitation in ever smaller clinical samples. However, a commonly available microarray platforms such as array comparative genomic hybridization (array CGH) allows the characterization of gene copy number at a single gene resolution using much smaller amounts of genomic DNA. In this study we evaluate the sensitivity of ultra-dense array CGH platforms developed by Agilent, especially that of the 1 million probe array (1 M array), and their application when whole genome amplification is required because of limited sample quantities.
Methods
We performed array CGH on whole genome amplified and not amplified genomic DNA from MCF-7 breast cancer cells, using 244 K and 1 M Agilent arrays. The ADM-2 algorithm was used to identify micro-copy number alterations that measured less than 1 Mb in genomic length.
Results
DNA from MCF-7 breast cancer cells was analyzed for micro-copy number alterations, defined as measuring less than 1 Mb in genomic length. The 4-fold extra resolution of the 1 M array platform relative to the less dense 244 K array platform, led to the improved detection of copy number variations (CNVs) and micro-CNAs. The identification of intra-genic breakpoints in areas of DNA copy number gain signaled the possible presence of gene fusion events. However, the ultra-dense platforms, especially the densest 1 M array, detect artifacts inherent to whole genome amplification and should be used only with non-amplified DNA samples.
Conclusions
This is a first report using 1 M array CGH for the discovery of cancer genes and biomarkers. We show the remarkable capacity of this technology to discover CNVs, micro-copy number alterations and even gene fusions. However, these platforms require excellent genomic DNA quality and do not tolerate relatively small imperfections related to the whole genome amplification.
doi:10.1186/1755-8794-4-16
PMCID: PMC3041991  PMID: 21272361
15.  Spatial normalization of array-CGH data 
BMC Bioinformatics  2006;7:264.
Background
Array-based comparative genomic hybridization (array-CGH) is a recently developed technique for analyzing changes in DNA copy number. As in all microarray analyses, normalization is required to correct for experimental artifacts while preserving the true biological signal. We investigated various sources of systematic variation in array-CGH data and identified two distinct types of spatial effect of no biological relevance as the predominant experimental artifacts: continuous spatial gradients and local spatial bias. Local spatial bias affects a large proportion of arrays, and has not previously been considered in array-CGH experiments.
Results
We show that existing normalization techniques do not correct these spatial effects properly. We therefore developed an automatic method for the spatial normalization of array-CGH data. This method makes it possible to delineate and to eliminate and/or correct areas affected by spatial bias. It is based on the combination of a spatial segmentation algorithm called NEM (Neighborhood Expectation Maximization) and spatial trend estimation. We defined quality criteria for array-CGH data, demonstrating significant improvements in data quality with our method for three data sets coming from two different platforms (198, 175 and 26 BAC-arrays).
Conclusion
We have designed an automatic algorithm for the spatial normalization of BAC CGH-array data, preventing the misinterpretation of experimental artifacts as biologically relevant outliers in the genomic profile. This algorithm is implemented in the R package MANOR (Micro-Array NORmalization), which is described at and available from the Bioconductor site . It can also be tested on the CAPweb bioinformatics platform at .
doi:10.1186/1471-2105-7-264
PMCID: PMC1523216  PMID: 16716215
16.  ADaCGH: A Parallelized Web-Based Application and R Package for the Analysis of aCGH Data 
PLoS ONE  2007;2(8):e737.
Background
Copy number alterations (CNAs) in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH). The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies.
Methodology/Principal Findings
ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM). For improved speed, we use parallel computing (via MPI). Additional information (GO terms, PubMed citations, KEGG and Reactome pathways) is available for individual genes, and for sets of genes with altered copy numbers.
Conclusions/Significance
ADaCGH represents a qualitative increase in the standards of these types of applications: a) all of the best performing algorithms are included, not just one or two; b) we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45×); c) we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms); d) we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e) all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.
doi:10.1371/journal.pone.0000737
PMCID: PMC1940324  PMID: 17710137
17.  Reference-unbiased copy number variant analysis using CGH microarrays 
Nucleic Acids Research  2010;38(20):e190.
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.
doi:10.1093/nar/gkq730
PMCID: PMC2978381  PMID: 20802225
18.  Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model 
BMC Bioinformatics  2010;11:539.
Background
Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale.
Results
We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms.
Conclusions
In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.
doi:10.1186/1471-2105-11-539
PMCID: PMC2992546  PMID: 21034510
19.  Estimating Genome-wide Copy Number using Allele Specific Mixture Models 
Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30 kb resolution which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing resolution. Recently, regression-type models that account for probe-effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (http://www.bioconductor.org).
doi:10.1089/cmb.2007.0148
PMCID: PMC2612042  PMID: 18707534
20.  Estimating Genome-Wide Copy Number Using Allele-Specific Mixture Models 
Journal of Computational Biology  2008;15(7):857-866.
Abstract
Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).
doi:10.1089/cmb.2007.0148
PMCID: PMC2612042  PMID: 18707534
algorithms; computational molecular biology; DNA arrays
21.  Analysis of Array-CGH Data Using the R and Bioconductor Software Suite 
Background. Array-based comparative genomic hybridization (array-CGH) is an emerging high-resolution and high-throughput molecular genetic technique that allows genome-wide screening for chromosome alterations. DNA copy number alterations (CNAs) are a hallmark of somatic mutations in tumor genomes and congenital abnormalities that lead to diseases such as mental retardation. However, accurate identification of amplified or deleted regions requires a sequence of different computational analysis steps of the microarray data. Results. We have developed a user-friendly and versatile tool for the normalization, visualization, breakpoint detection, and comparative analysis of array-CGH data which allows the accurate and sensitive detection of CNAs. Conclusion. The implemented option for the determination of minimal altered regions (MARs) from a series of tumor samples is a step forward in the identification of new tumor suppressor genes or oncogenes.
doi:10.1155/2009/201325
PMCID: PMC2728899  PMID: 19696946
22.  Detection limit of intragenic deletions with targeted array comparative genomic hybridization 
BMC Genetics  2013;14:116.
Background
Pathogenic mutations range from single nucleotide changes to deletions or duplications that encompass a single exon to several genes. The use of gene-centric high-density array comparative genomic hybridization (aCGH) has revolutionized the detection of intragenic copy number variations. We implemented an exon-centric design of high-resolution aCGH to detect single- and multi-exon deletions and duplications in a large set of genes using the OGT 60 K and 180 K arrays. Here we describe the molecular characterization and breakpoint mapping of deletions at the smaller end of the detectable range in several genes using aCGH.
Results
The method initially implemented to detect single to multiple exon deletions, was able to detect deletions much smaller than anticipated. The selected deletions we describe vary in size, ranging from over 2 kb to as small as 12 base pairs. The smallest of these deletions are only detectable after careful manual review during data analysis. Suspected deletions smaller than the detection size for which the method was optimized, were rigorously followed up and confirmed with PCR-based investigations to uncover the true detection size limit of intragenic deletions with this technology. False-positive deletion calls often demonstrated single nucleotide changes or an insertion causing lower hybridization of probes demonstrating the sensitivity of aCGH.
Conclusions
With optimizing aCGH design and careful review process, aCGH can uncover intragenic deletions as small as dozen bases. These data provide insight that will help optimize probe coverage in array design and illustrate the true assay sensitivity. Mapping of the breakpoints confirms smaller deletions and contributes to the understanding of the mechanism behind these events. Our knowledge of the mutation spectra of several genes can be expected to change as previously unrecognized intragenic deletions are uncovered.
doi:10.1186/1471-2156-14-116
PMCID: PMC4235222  PMID: 24304607
aCGH; Intragenic deletions; Breakpoint analysis; Molecular characterization
23.  The BCM Microarray Core Facility: Closing the Next- Generation Gap 
CF-39
The Microarray Core Facility (MCF) at Baylor College of Medicine provides investigators with access to a variety of state-of-the-art technologies and approaches that will enhance discovery for their genomic research. We house instrumentation supporting Affymetrix, Agilent, NimbleGen, Luminex, and Illumina platforms. The MCF provides expertise in the following applications: gene expression, array comparative genomic hybridization (aCGH), SNP genotyping, and next-generation sequencing. In addition, our lab offer services for sample quality check and a cDNA clone repository, for those that are interested in verifying results from gene expression experiments or any other application requiring cDNA clones. The MCF specializes in RNA applications that enable researchers to monitor genome-wide expression profiles through Affymetrix, Agilent and NimbleGen expression arrays.Agilent's aCGH and Affymetrix SNP Arrays are also offered, providing detection of copy number variations across the genome.Other related services include: tiling arrays, ChIP-on-chip arrays, SuperArray, Promoter Arrays, and Panomics. Due to the increased demand for rapid DNA sequencing, the facility now provides massively parallel “next generation” sequencing on the Illumina Genome Analyzer II.Our core lab has established a workflow involving: project consultation, sample quality check, sample preparation and data generation for each sequencing project.Illumina's sequencing platform provides high-quality data in the following applications: gene expression and alternative splicing (mRNA-Seq), protein-nucleic acid association profiling and epigenetics (ChIP-Seq), sequencing targeted genomic regions, small RNA discovery (small RNA-Seq) and de novo sequencing.The MCF offers investigators access to an array of emerging technologies while assisting in experimental design and data analysis.
PMCID: PMC2918027
24.  Cross-Species Array Comparative Genomic Hybridization Identifies Novel Oncogenic Events in Zebrafish and Human Embryonal Rhabdomyosarcoma 
PLoS Genetics  2013;9(8):e1003727.
Human cancer genomes are highly complex, making it challenging to identify specific drivers of cancer growth, progression, and tumor maintenance. To bypass this obstacle, we have applied array comparative genomic hybridization (array CGH) to zebrafish embryonal rhabdomyosaroma (ERMS) and utilized cross-species comparison to rapidly identify genomic copy number aberrations and novel candidate oncogenes in human disease. Zebrafish ERMS contain small, focal regions of low-copy amplification. These same regions were commonly amplified in human disease. For example, 16 of 19 chromosomal gains identified in zebrafish ERMS also exhibited focal, low-copy gains in human disease. Genes found in amplified genomic regions were assessed for functional roles in promoting continued tumor growth in human and zebrafish ERMS – identifying critical genes associated with tumor maintenance. Knockdown studies identified important roles for Cyclin D2 (CCND2), Homeobox Protein C6 (HOXC6) and PlexinA1 (PLXNA1) in human ERMS cell proliferation. PLXNA1 knockdown also enhanced differentiation, reduced migration, and altered anchorage-independent growth. By contrast, chemical inhibition of vascular endothelial growth factor (VEGF) signaling reduced angiogenesis and tumor size in ERMS-bearing zebrafish. Importantly, VEGFA expression correlated with poor clinical outcome in patients with ERMS, implicating inhibitors of the VEGF pathway as a promising therapy for improving patient survival. Our results demonstrate the utility of array CGH and cross-species comparisons to identify candidate oncogenes essential for the pathogenesis of human cancer.
Author Summary
Cancer is a complex genetic disease that is often associated with regional gains and losses of genomic DNA segments. These changes result in aberrant gene expression and drive continued tumor growth. Because amplified and deleted DNA segments tend to span large regions of chromosomes, it has been challenging to identify the genes that are required for continued tumor growth and progression. Array comparative genomic hybridization (array CGH) is an effective technology in identifying abnormal copy number variations in cancer genomes. In this study, array CGH was used in a zebrafish model of embryonal rhabdomyosarcoma - a pediatric muscle tumor. Our work shows that the zebrafish cancer genome contains a small number of recurrent DNA copy number changes, which are also commonly amplified in the human disease. Moreover, these chromosomal regions are small, facilitating rapid identification of candidate oncogenes. A subset of genes identified in zebrafish array CGH was prioritized for functional characterization in human ERMS, identifying evolutionarily conserved pathways that regulate proliferation, migration, differentiation, and neovascularization. Our results demonstrate the broad utility of cross-species array CGH comparisons of human and zebrafish cancer and provide a much needed discovery platform for identifying critical cancer-causing genes in a wide range of malignancies.
doi:10.1371/journal.pgen.1003727
PMCID: PMC3757044  PMID: 24009521
25.  Comprehensive copy number profiles of breast cancer cell model genomes 
Breast Cancer Research  2006;8(1):R9.
Introduction
Breast cancer is the most commonly diagnosed cancer in women worldwide and consequently has been extensively investigated in terms of histopathology, immunochemistry and familial history. Advances in genome-wide approaches have contributed to molecular classification with respect to genomic changes and their subsequent effects on gene expression. Cell lines have provided a renewable resource that is readily used as model systems for breast cancer cell biology. A thorough characterization of their genomes to identify regions of segmental DNA loss (potential tumor-suppressor-containing loci) and gain (potential oncogenic loci) would greatly facilitate the interpretation of biological data derived from such cells. In this study we characterized the genomes of seven of the most commonly used breast cancer model cell lines at unprecedented resolution using a newly developed whole-genome tiling path genomic DNA array.
Methods
Breast cancer model cell lines MCF-7, BT-474, MDA-MB-231, T47D, SK-BR-3, UACC-893 and ZR-75-30 were investigated for genomic alterations with the submegabase-resolution tiling array (SMRT) array comparative genomic hybridization (CGH) platform. SMRT array CGH provides tiling coverage of the human genome permitting break-point detection at about 80 kilobases resolution. Two novel discrete alterations identified by array CGH were verified by fluorescence in situ hybridization.
Results
Whole-genome tiling path array CGH analysis identified novel high-level alterations and fine-mapped previously reported regions yielding candidate genes. In brief, 75 high-level gains and 48 losses were observed and their respective boundaries were documented. Complex alterations involving multiple levels of change were observed on chromosome arms 1p, 8q, 9p, 11q, 15q, 17q and 20q. Furthermore, alignment of whole-genome profiles enabled simultaneous assessment of copy number status of multiple components of the same biological pathway. Investigation of about 60 loci containing genes associated with the epidermal growth factor family (epidermal growth factor receptor, HER2, HER3 and HER4) revealed that all seven cell lines harbor copy number changes to multiple genes in these pathways.
Conclusion
The intrinsic genetic differences between these cell lines will influence their biologic and pharmacologic response as an experimental model. Knowledge of segmental changes in these genomes deduced from our study will facilitate the interpretation of biological data derived from such cells.
doi:10.1186/bcr1370
PMCID: PMC1413994  PMID: 16417655

Results 1-25 (1259133)