summarizes currently available array CGH software programs and compares the algorithms used in the detection of segmental copy number changes and the types of visualization available.
Software for analysis and visualization of array CGH data.
Typically, software programs are developed to support the analysis and/or visualization of specific array platforms, especially for the commercially available platforms. For example, Affymetrix (Affymetrix Copy Number Analysis Tool) and Nimblegen (Nimblegen SignalMap) have been developed by the respective companies for their manufactured arrays. In contrast, software applications developed by academic laboratories were generally designed to handle a primary array utilized by the research group and upon subsequent improvements, could handle data from other commonly used array platforms. The application SeeGH, as an example, was initially developed to visualize and analyze BAC array CGH data but in new versions of the application, data from oligonucleotide or cDNA platforms can be accommodated. Furthermore, other programs such as ArrayCyGHt, CGH-Explorer, M-CGH and Normalise Suite v2.5 also demonstrate versatility by handling the data generated by all three types of array platforms (). The visualization capabilities of these applications are compared based on the ability to view single or multiple experiments, and simple static graphical representations versus interactive displays (). Here, we highlight three software examples to illustrate interactive display: CGHPro, CGHAnalyzer v2.2 and SeeGH v3.0.
is a Java-based software operable on multiple operating systems. It requires the installation of the Java Runtime Environment Version 1.4.2 or higher, the statistical package R (Ihaka and Gentleman, 1996
) Version 1.9.1 and the MySQL database server to store array CGH experiments (Chen et al. 2005
). The major functionalities in this software include data quality assessment through graphical means, normalization of data using commonly used techniques for microarray imaging, integration of previously designed algorithms for alteration detection, and multiple methods for visualization. In addition, CGHPro
can input formatted data from a variety of array platforms.
Data quality assessment is achieved using graphical methods such as scatter plots of the log2
spot intensities, box plots, histograms, M-A plots and QQ plots. Data filtering is achieved using user-defined parameters. Normalization routines include: Global Median, Subgrid Median, LOWESS (locally weighted scatter plot smooth), Subgrid LOWESS, and dye-swap normalization. Alteration detection algorithms include direct thresholding and thresholding after use of segmentation algorithms, incorporating the aCGH bioconductor
(HMM) and DNACopy
(CBS) packages (Fridlyand et al. 2004
; Olshen et al. 2004
). Visualization is interactive allowing sequential magnification and viewing of multiple experiments.
is also a Java-based software with the requirement of Java Runtime Environment version 1.4 or later (Margolin et al. 2005
). This program allows querying of pre-loaded or custom gene sets for copy number status and integrates the clustering options of TIGR Multi-Experiment Viewer
(Saeed et al. 2003
does not have normalization functions requiring pre-normalized data. However, mapping information for UPenn BAC array and Affymetrix P501 SNP array are pre-loaded.
Two visualization layouts are provided to give the option of viewing the chromosomes in concentric circles or as traditional chromosome ideograms. Multiple experiments can be viewed using heatmap alignment of individual chromosomes. Alteration detection depends on direct thresholding or by variation from a pre-defined distribution.
SeeGH was developed in C++, runs on Windows platform, requiring MySQL as the database structure. It accepts pre-normalized data and allows filtering of replica data points based on standard deviation and signal-to-noise ratio cut-offs. SeeGH accommodates data from a variety of sources, for example copy-number, gene expression, and global methylation profiles. Interactive display functions include sequential magnification, linking of clones to genes and, in turn, to biological databases (e.g. UCSC Genome Browser). Localization to specific regions of interest can be achieved through querying of identifiers such as gene name, clone name, and base pair position. Experimental parameters and user comments are stored within SeeGH allowing convenient information retrieval.
In addition, users can add customized or preloaded tracks to display gene location, CpG island position, microRNA location, etc. Multiple chromosome alignment, frequency summary plot, and heatmap display are included options for viewing multiple experiments (). Direct thresholding and moving average based thresholding are built in for alteration detection. Alternatively, segmentation using external software (e.g. aCGH-Smooth) can be imported for visualization.
Figure 6. Examples of multiple experiment visualization methods in SeeGH. A: Multiple alignment of individual chromosome profiles. B: Frequency plot summarizing multiple experiments. Here, red histograms represent frequency of gains and green lost. C: Heatmap display (more ...)
Considerations for future software development
With the rapid accumulation of large scale high throughput data describing cancer genomes, epigenomes, and transcriptomes, cross-platform meta-analysis will become prevalent. However, researchers with limited genomics and computational expertise will not be able to readily take advantage of such information. The development of facile, web-based software for the integration of large scale multidisciplinary databases will facilitate the widespread mining of genomic data and their correlation with clinical features (Kingsley et al. 2006
). These issues are more pronounced with the increasing emphasis on translational research as array CGH technology moves towards clinical application. Added consideration of the ease of use, information security, automation and incorporation of prior knowledge of disease to assist in interpretation is necessary to deliver these emerging technologies to a clinical setting.