Search tips
Search criteria

Results 1-25 (1172833)

Clipboard (0)

Related Articles

1.  Analysis of Pharmacokinetics, Pharmacodynamics, and Pharmacogenomics Data Sets Using VizStruct, A Novel Multidimensional Visualization Technique 
Pharmaceutical research  2004;21(5):777-780.
Data visualization techniques for the pharmaceutical sciences have not been extensively investigated. The purpose of this study was to evaluate the usefulness of VizStruct, a multidimensional visualization tool, for applications in pharmacokinetics, pharmacodynamics, and pharmacogenomics.
The VizStruct tool uses the first harmonic of the discrete Fourier transform to map multidimensional data to two dimensions for visualization. The mapping was used to visualize several published pharmacokinetic, pharmacodynamic, and pharmacogenomic data sets. The VizStruct approach was evaluated using simulated population pharmacokinetics data sets, the data from Dalen and colleagues (Clin. Pharmacol. Ther. 63:444−452, 1998) on the kinetics of nortriptyline and its 10-hydroxy-nortriptyline metabolite in subjects with differing number of copies of the CYP2D6, and the gene expression profiling data of Bohen and colleagues (Proc. Natl. Acad. Sci. USA 100:1926−1930, 2003) on follicular lymphoma patients responsive and nonresponsive to rituximab.
The VizStruct mapping preserves the key characteristics of multidimensional data in two dimensions in a manner that facilitates visualization. The mapping is computationally efficient and can be used for cluster detection and class prediction in pharmaceutical data sets. The VizStruct visualization succinctly summarized the salient similarities and differences in the nortriptyline and 10-hydroxynortriptyline pharmacokinetic profiles in subjects with increasing number of CYP2D6 gene copies. In the simulated population pharmacokinetic data sets, it was capable of discriminating the subtle differences between pharmacokinetic profiles derived from 1- and 2-compartment models with the same area under the curve. The two-dimensional VizStruct mapping computed from a subset of 102 informative genes from the Bohen and colleagues data set effectively separated the rituximab responder, rituximab nonresponder, and control subject groups.
The VizStruct approach is a computationally efficient and effective approach for visualizing complex, multidimensional data sets. It could have many useful applications in the pharmaceutical sciences.
PMCID: PMC2607483  PMID: 15180333
microarray; pharmacodynamics; pharmacogenomic modeling; pharmacokinetics; visualization algorithms
2.  Information-theoretic identification of predictive SNPs and supervised visualization of genome-wide association studies 
Nucleic Acids Research  2006;34(14):e101.
The size, dimensionality and the limited range of the data values makes visualization of single nucleotide polymorphism (SNP) datasets challenging. The purpose of this study is to evaluate the usefulness of 3D VizStruct, a novel multi-dimensional data visualization technique for SNP datasets capable of identifying informative SNPs in genome-wide association studies. VizStruct is an interactive visualization technique that reduces multi-dimensional data to three dimensions using a combination of the discrete Fourier transform and the Kullback–Leibler divergence. The performance of 3D VizStruct was challenged with several diverse, biologically relevant published datasets including the human lipoprotein lipase (LPL) gene locus, the human Y-chromosome in several populations and a multi-locus genotype dataset of coral samples from four populations. In every case, the SNPs and or polymorphic markers identified by the 3D VizStruct mapping were predictive of the underlying biology.
PMCID: PMC1557808  PMID: 16899448
3.  Modeling of environmental and genetic interactions with AMBROSIA, an information-theoretic model synthesis method 
Heredity  2011;107(4):320-327.
To develop a model synthesis method for parsimoniously modeling gene–environmental interactions (GEI) associated with clinical outcomes and phenotypes. The AMBROSIA model synthesis approach utilizes the k-way interaction information (KWII), an information-theoretic metric capable of identifying variable combinations associated with GEI. For model synthesis, AMBROSIA considers relevance of combinations to the phenotype, it precludes entry of combinations with redundant information, and penalizes for unjustifiable complexity; each step is KWII based. The performance and power of AMBROSIA were evaluated with simulations and Genetic Association Workshop 15 (GAW15) data sets of rheumatoid arthritis (RA). AMBROSIA identified parsimonious models in data sets containing multiple interactions with linkage disequilibrium present. For the GAW15 data set containing 9187 single-nucleotide polymorphisms, the parsimonious AMBROSIA model identified nine RA-associated combinations with power >90%. AMBROSIA was compared with multifactor dimensionality reduction across several diverse models and had satisfactory power. Software source code is available from AMBROSIA is a promising method for GEI model synthesis.
PMCID: PMC3182499  PMID: 21427755
gene–environment interactions; gene–gene interactions; k-way interaction information
4.  WHIDE—a web tool for visual data mining colocation patterns in multivariate bioimages 
Bioinformatics  2012;28(8):1143-1150.
Motivation: Bioimaging techniques rapidly develop toward higher resolution and dimension. The increase in dimension is achieved by different techniques such as multitag fluorescence imaging, Matrix Assisted Laser Desorption / Ionization (MALDI) imaging or Raman imaging, which record for each pixel an N-dimensional intensity array, representing local abundances of molecules, residues or interaction patterns. The analysis of such multivariate bioimages (MBIs) calls for new approaches to support users in the analysis of both feature domains: space (i.e. sample morphology) and molecular colocation or interaction. In this article, we present our approach WHIDE (Web-based Hyperbolic Image Data Explorer) that combines principles from computational learning, dimension reduction and visualization in a free web application.
Results: We applied WHIDE to a set of MBI recorded using the multitag fluorescence imaging Toponome Imaging System. The MBI show field of view in tissue sections from a colon cancer study and we compare tissue from normal/healthy colon with tissue classified as tumor. Our results show, that WHIDE efficiently reduces the complexity of the data by mapping each of the pixels to a cluster, referred to as Molecular Co-Expression Phenotypes and provides a structural basis for a sophisticated multimodal visualization, which combines topology preserving pseudocoloring with information visualization. The wide range of WHIDE's applicability is demonstrated with examples from toponome imaging, high content screens and MALDI imaging (shown in the Supplementary Material).
Availability and implementation: The WHIDE tool can be accessed via the BioIMAX website; Login: whidetestuser; Password: whidetest.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3324520  PMID: 22390938
5.  Fast extraction of neuron morphologies from large-scale SBFSEM image stacks 
Neuron morphology is frequently used to classify cell-types in the mammalian cortex. Apart from the shape of the soma and the axonal projections, morphological classification is largely defined by the dendrites of a neuron and their subcellular compartments, referred to as dendritic spines. The dimensions of a neuron’s dendritic compartment, including its spines, is also a major determinant of the passive and active electrical excitability of dendrites. Furthermore, the dimensions of dendritic branches and spines change during postnatal development and, possibly, following some types of neuronal activity patterns, changes depending on the activity of a neuron. Due to their small size, accurate quantitation of spine number and structure is difficult to achieve (Larkman, J Comp Neurol 306:332, 1991). Here we follow an analysis approach using high-resolution EM techniques. Serial block-face scanning electron microscopy (SBFSEM) enables automated imaging of large specimen volumes at high resolution. The large data sets generated by this technique make manual reconstruction of neuronal structure laborious. Here we present NeuroStruct, a reconstruction environment developed for fast and automated analysis of large SBFSEM data sets containing individual stained neurons using optimized algorithms for CPU and GPU hardware. NeuroStruct is based on 3D operators and integrates image information from image stacks of individual neurons filled with biocytin and stained with osmium tetroxide. The focus of the presented work is the reconstruction of dendritic branches with detailed representation of spines. NeuroStruct delivers both a 3D surface model of the reconstructed structures and a 1D geometrical model corresponding to the skeleton of the reconstructed structures. Both representations are a prerequisite for analysis of morphological characteristics and simulation signalling within a neuron that capture the influence of spines.
Electronic supplementary material  The online version of this article (doi:10.1007/s10827-011-0316-1) contains supplementary material, which is available to authorized users.
PMCID: PMC3232351  PMID: 21424815
SBFSEM; Segmentation; Reconstruction of neurons; Image processing; GPGPU computing
6.  Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes 
BMC Bioinformatics  2009;10:27.
The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have been actively utilized in order to identify groups of co-expressed genes. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. Using this set of genes as the training set, the remaining genes are classified by a supervised learning algorithm. In this work, we have used a Support Vector Machine (SVM) classifier for this purpose.
The performance of the proposed clustering technique has been demonstrated on five publicly available benchmark microarray data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat Central Nervous System. Comparative studies of the use of different SVM kernels and several widely used microarray clustering techniques are reported. Moreover, statistical significance tests have been carried out to establish the statistical superiority of the proposed clustering approach. Finally, biological significance tests have been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of co-expressed genes.
The proposed clustering method has been shown to perform better than other well-known clustering algorithms in finding clusters of co-expressed genes efficiently. The clusters of genes produced by the proposed technique are also found to be biologically significant, i.e., consist of genes which belong to the same functional groups. This indicates that the proposed clustering method can be used efficiently to identify co-expressed genes in microarray gene expression data.
Supplementary Website The pre-processed and normalized data sets, the matlab code and other related materials are available at .
PMCID: PMC2657792  PMID: 19154590
7.  A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors 
BMC Genomics  2006;7:190.
Although microscopic diagnosis has been playing the decisive role in cancer diagnostics, there have been cases in which it does not satisfy the clinical need. Differential diagnosis of malignant and benign thyroid tissues is one such case, and supplementary diagnosis such as that by gene expression profile is expected.
With four thyroid tissue types, i.e., papillary carcinoma, follicular carcinoma, follicular adenoma, and normal thyroid, we performed gene expression profiling with adaptor-tagged competitive PCR, a high-throughput RT-PCR technique. For differential diagnosis, we applied a novel multi-class predictor, introducing probabilistic outputs. Multi-class predictors were constructed using various combinations of binary classifiers. The learning set included 119 samples, and the predictors were evaluated by strict leave-one-out cross validation. Trials included classical combinations, i.e., one-to-one, one-to-the-rest, but the predictor using more combination exhibited the better prediction accuracy. This characteristic was consistent with other gene expression data sets. The performance of the selected predictor was then tested with an independent set consisting of 49 samples. The resulting test prediction accuracy was 85.7%.
Molecular diagnosis of thyroid tissues is feasible by gene expression profiling, and the current level is promising towards the automatic diagnostic tool to complement the present medical procedures. A multi-class predictor with an exhaustive combination of binary classifiers could achieve a higher prediction accuracy than those with classical combinations and other predictors such as multi-class SVM. The probabilistic outputs of the predictor offer more detailed information for each sample, which enables visualization of each sample in low-dimensional classification spaces. These new concepts should help to improve the multi-class classification including that of cancer tissues.
PMCID: PMC1550728  PMID: 16872506
8.  Transcriptomic events associated with internal browning of apple during postharvest storage 
BMC Plant Biology  2014;14(1):328.
Postharvest ripening of apple (Malus x domestica) can be slowed down by low temperatures, and a combination of low O2 and high CO2 levels. While this maintains the quality of most fruit, occasionally storage disorders such as flesh browning can occur. This study aimed to explore changes in the apple transcriptome associated with a flesh browning disorder related to controlled atmosphere storage using RNA-sequencing techniques. Samples from a browning-susceptible cultivar (‘Braeburn’) were stored for four months under controlled atmosphere. Based on a visual browning index, the inner and outer cortex of the stored apples was classified as healthy or affected tissue.
Over 600 million short single-end reads were mapped onto the Malus consensus coding sequence set, and differences in the expression profiles between healthy and affected tissues were assessed to identify candidate genes associated with internal browning in a tissue-specific manner. Genes involved in lipid metabolism, secondary metabolism, and cell wall modifications were highly modified in the affected inner cortex, while energy-related and stress-related genes were mostly altered in the outer cortex. The expression levels of several of them were confirmed using qRT-PCR. Additionally, a set of novel browning-specific differentially expressed genes, including pyruvate dehydrogenase and 1-aminocyclopropane-1-carboxylate oxidase, was validated in apples stored for various periods at different controlled atmosphere conditions, giving rise to potential biomarkers associated with high risk of browning development.
The gene expression data presented in this study will help elucidate the molecular mechanism of browning development in apples at controlled atmosphere storage. A conceptual model, including energy-related (linked to the tricarboxylic acid cycle and the electron transport chain) and lipid-related genes (related to membrane alterations, and fatty acid oxidation), for browning development in apple is proposed, which may be relevant for future studies towards improving the postharvest life of apple.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0328-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4272543  PMID: 25430515
Apple fruit; Browning disorder; Metabolic pathways; Postharvest physiology; RNA sequencing; Transcriptomics
9.  Colon cancer prediction with genetic profiles using intelligent techniques 
Bioinformation  2008;3(3):130-133.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.
PMCID: PMC2639687  PMID: 19238250
gene expression; tumor classification; t-statistic; feature selection; SVM neural network; logistic regression
10.  InCroMAP: integrated analysis of cross-platform microarray and pathway data 
Bioinformatics  2012;29(4):506-508.
Summary: Microarrays are commonly used to detect changes in gene expression between different biological samples. For this purpose, many analysis tools have been developed that offer visualization, statistical analysis and more sophisticated analysis methods. Most of these tools are designed specifically for messenger RNA microarrays. However, today, more and more different microarray platforms are available. Changes in DNA methylation, microRNA expression or even protein phosphorylation states can be detected with specialized arrays. For these microarray technologies, the number of available tools is small compared with mRNA analysis tools. Especially, a joint analysis of different microarray platforms that have been used on the same set of biological samples is hardly supported by most microarray analysis tools. Here, we present InCroMAP, a tool for the analysis and visualization of high-level microarray data from individual or multiple different platforms. Currently, InCroMAP supports mRNA, microRNA, DNA methylation and protein modification datasets. Several methods are offered that allow for an integrated analysis of data from those platforms. The available features of InCroMAP range from visualization of DNA methylation data over annotation of microRNA targets and integrated gene set enrichment analysis to a joint visualization of data from all platforms in the context of metabolic or signalling pathways.
Availability: InCroMAP is freely available as Java™ application at, including a comprehensive user’s guide and example files.
Contact: or
PMCID: PMC3570209  PMID: 23257199
11.  Trustworthiness and metrics in visualizing similarity of gene expression 
BMC Bioinformatics  2003;4:48.
Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets.
The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric.
The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.
PMCID: PMC272927  PMID: 14552657
12.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging 
How to predict gene function from phenotypic cues is a longstanding question in biology.Using quantitative multiparametric imaging, RNAi-mediated cell phenotypes were measured on a genome-wide scale.On the basis of phenotypic ‘neighbourhoods', we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.The phenotypic map is provided as an online resource at for discovering further functional relationships for a broad spectrum of biological module
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss-of-function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein-Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large-scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi-mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer-readable phenotypic signatures, we measured the effect of RNAi-mediated knockdowns on changes of cell morphology in human cells on a genome-wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome-wide map as a two-dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at The map identified phenotypic ‘neighbourhoods', which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore-associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage-activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods' to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome-wide RNAi screen in human cells and used quantitative descriptors derived from high-throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
PMCID: PMC2913390  PMID: 20531400
DNA damage response signalling; massively parallel phenotyping; phenotype networks; RNAi screening
13.  CGI: Java Software for Mapping and Visualizing Data from Array-based Comparative Genomic Hybridization and Expression Profiling 
With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.
PMCID: PMC2759124  PMID: 19936083
aCGH; expression profiling; visualization; correlation; and data integration
14.  InCHlib – interactive cluster heatmap for web applications 
Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called ‘cluster heatmap’ is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap.
We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust.
The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Electronic supplementary material
The online version of this article (doi:10.1186/s13321-014-0044-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4173117  PMID: 25264459
Data clustering; Cluster heatmap; Scientific visualization; Web integration; Client-side scripting; JavaScript library; Big data; Exploration
15.  Gene Expression Browser: large-scale and cross-experiment microarray data integration, management, search & visualization 
BMC Bioinformatics  2010;11:433.
In the last decade, a large amount of microarray gene expression data has been accumulated in public repositories. Integrating and analyzing high-throughput gene expression data have become key activities for exploring gene functions, gene networks and biological pathways. Effectively utilizing these invaluable microarray data remains challenging due to a lack of powerful tools to integrate large-scale gene-expression information across diverse experiments and to search and visualize a large number of gene-expression data points.
Gene Expression Browser is a microarray data integration, management and processing system with web-based search and visualization functions. An innovative method has been developed to define a treatment over a control for every microarray experiment to standardize and make microarray data from different experiments homogeneous. In the browser, data are pre-processed offline and the resulting data points are visualized online with a 2-layer dynamic web display. Users can view all treatments over control that affect the expression of a selected gene via Gene View, and view all genes that change in a selected treatment over control via treatment over control View. Users can also check the changes of expression profiles of a set of either the treatments over control or genes via Slide View. In addition, the relationships between genes and treatments over control are computed according to gene expression ratio and are shown as co-responsive genes and co-regulation treatments over control.
Gene Expression Browser is composed of a set of software tools, including a data extraction tool, a microarray data-management system, a data-annotation tool, a microarray data-processing pipeline, and a data search & visualization tool. The browser is deployed as a free public web service ( that integrates 301 ATH1 gene microarray experiments from public data repositories (viz. the Gene Expression Omnibus repository at the National Center for Biotechnology Information and Nottingham Arabidopsis Stock Center). The set of Gene Expression Browser software tools can be easily applied to the large-scale expression data generated by other platforms and in other species.
PMCID: PMC2941691  PMID: 20727159
16.  arrayMap: A Reference Resource for Genomic Copy Number Imbalances in Human Malignancies 
PLoS ONE  2012;7(5):e36944.
The delineation of genomic copy number abnormalities (CNAs) from cancer samples has been instrumental for identification of tumor suppressor genes and oncogenes and proven useful for clinical marker detection. An increasing number of projects have mapped CNAs using high-resolution microarray based techniques. So far, no single resource does provide a global collection of readily accessible oncogenomic array data.
Methodology/Principal Findings
We here present arrayMap, a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides a platform for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. To date, the resource incorporates more than 40,000 arrays in 224 cancer types extracted from several resources, including the NCBI’s Gene Expression Omnibus (GEO), EBI’s ArrayExpress (AE), The Cancer Genome Atlas (TCGA), publication supplements and direct submissions. For the majority of the included datasets, probe level and integrated visualization facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools.
To our knowledge, currently no data source provides an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities. arrayMap represents our effort for providing a long term platform for oncogenomic CNA data independent of specific platform considerations or specific project dependence. The online database can be accessed at http//
PMCID: PMC3356349  PMID: 22629346
17.  What google maps can do for biomedical data dissemination: examples and a design study 
BMC Research Notes  2013;6:179.
Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data.
We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers.
We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
PMCID: PMC3658873  PMID: 23642009
Bioinformatics; Biological visualization; Data dissemination; Regulation networks; Design guidelines
18.  ObStruct: A Method to Objectively Analyse Factors Driving Population Structure Using Bayesian Ancestry Profiles 
PLoS ONE  2014;9(1):e85196.
Bayesian inference methods are extensively used to detect the presence of population structure given genetic data. The primary output of software implementing these methods are ancestry profiles of sampled individuals. While these profiles robustly partition the data into subgroups, currently there is no objective method to determine whether the fixed factor of interest (e.g. geographic origin) correlates with inferred subgroups or not, and if so, which populations are driving this correlation. We present ObStruct, a novel tool to objectively analyse the nature of structure revealed in Bayesian ancestry profiles using established statistical methods. ObStruct evaluates the extent of structural similarity between sampled and inferred populations, tests the significance of population differentiation, provides information on the contribution of sampled and inferred populations to the observed structure and crucially determines whether the predetermined factor of interest correlates with inferred population structure. Analyses of simulated and experimental data highlight ObStruct's ability to objectively assess the nature of structure in populations. We show the method is capable of capturing an increase in the level of structure with increasing time since divergence between simulated populations. Further, we applied the method to a highly structured dataset of 1,484 humans from seven continents and a less structured dataset of 179 Saccharomyces cerevisiae from three regions in New Zealand. Our results show that ObStruct provides an objective metric to classify the degree, drivers and significance of inferred structure, as well as providing novel insights into the relationships between sampled populations, and adds a final step to the pipeline for population structure analyses.
PMCID: PMC3887034  PMID: 24416362
19.  GeneclusterViz: a tool for conserved gene cluster visualization, exploration and analysis 
Bioinformatics  2012;28(11):1527-1529.
Motivation: Gene clusters are arrangements of functionally related genes on a chromosome. In bacteria, it is expected that evolutionary pressures would conserve these arrangements due to the functional advantages they provide. Visualization of conserved gene clusters across multiple genomes provides key insights into their evolutionary histories. Therefore, a software tool that enables visualization and functional analyses of gene clusters would be a great asset to the biological research community.
Results: We have developed GeneclusterViz, a Java-based tool that allows for the visualization, exploration and downstream analyses of conserved gene clusters across multiple genomes. GeneclusterViz combines an easy-to-use exploration interface for gene clusters with a host of other analysis features such as multiple sequence alignments, phylogenetic analyses and integration with the KEGG pathway database.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3356842  PMID: 22495752
20.  An intuitive graphical visualization technique for the interrogation of transcriptome data 
Nucleic Acids Research  2011;39(17):7380-7389.
The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLAB-implemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at
PMCID: PMC3177207  PMID: 21690098
21.  Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene- and Protein-Expression Studies 
The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. While some investigators are focused on identifying informative genes and proteins that play a role in specific diseases, other researchers have attempted instead to use patients based on their expression profiles to prognosticate disease status. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the ‘curse of dimensionality’, occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. While some researchers have begun to explore nonlinear DR methods for computer vision problems such as face detection and recognition, to the best of our knowledge, few such attempts have been made for classification and visualization of high-dimensional biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable. Owing to the to the inherent nonlinear structure of gene- and protein-expression studies, our claim is that the nonlinear DR methods provide a more truthful low-dimensional representation of the data compared to the linear DR schemes. Evaluation of the DR schemes was done by (i) assessing the discriminability of two supervised classifiers (Support Vector Machine and C4.5 Decision Trees) in the different low-dimensional data embeddings and (ii) 5 cluster validity measures to evaluate the size, distance and tightness of object aggregates in the low-dimensional space. For each of the 7 evaluation measures considered, statistically significant improvement in the quality of the embeddings across 10 cancer datasets via the use of 3 nonlinear DR schemes over 3 linear DR techniques was observed. Similar trends were observed when linear and nonlinear DR was applied to the high-dimensional data following feature pruning to isolate the most informative features. Qualitative evaluation of the low-dimensional data embedding obtained via the 6 DR methods further suggests that the nonlinear schemes are better able to identify potential novel classes (e.g. cancer subtypes) within the data.
PMCID: PMC2562675  PMID: 18670041
Dimensionality reduction; bioinformatics; data clustering; data visualization; machine learning; manifold learning; nonlinear dimensionality reduction; gene expression; proteomics; prostate cancer; lung cancer; ovarian cancer; principal component analysis; linear discriminant analysis; multidimensional scaling; Isomap; locally linear embedding; laplacian eigenmaps; classification; support vector machine; decision trees; LLE; PCA
22.  AVIS: AJAX viewer of interactive signaling networks 
Bioinformatics (Oxford, England)  2007;23(20):2803-2805.
Increasing complexity of cell signaling network maps requires sophisticated visualization technologies. Simple web-based visualization tools can allow for improved data presentation and collaboration. Researchers studying cell signaling would benefit from having the ability to embed dynamic cell signaling maps in web pages.
AVIS is a Google gadget compatible web-based viewer of interactive cell signaling networks. AVIS is an implementation of AJAX (Asynchronous JavaScript with XML) with the usage of the libraries GraphViz, ImageMagic (PerlMagic) and overLib. AVIS provides web-based visualization of text-based signaling networks with dynamical zooming, panning and linking capabilities. AVIS is a cross-platform web-based tool that can be used to visualize network maps as embedded objects in any web page. AVIS was implemented for visualization of PathwayGenerator, a tool that displays over 4000 automatically generated mammalian cell signaling maps; NodeNeighborhood a tool to visualize first and second interacting neighbors of yeast and mammalian proteins; and for Genes2Networks, a tool to connect lists of genes and protein using background protein interaction networks.
A demo page of AVIS and links to applications and distributions can be found at Detailed instructions for using and configuring AVIS can be found in the user manual at
PMCID: PMC2724864  PMID: 17855420
23.  gViz, a novel tool for the visualization of co-expression networks 
BMC Research Notes  2011;4:452.
The quantity of microarray data available on the Internet has grown dramatically over the past years and now represents millions of Euros worth of underused information. One way to use this data is through co-expression analysis. To avoid a certain amount of bias, such data must often be analyzed at the genome scale, for example by network representation. The identification of co-expression networks is an important means to unravel gene to gene interactions and the underlying functional relationship between them. However, it is very difficult to explore and analyze a network of such dimensions. Several programs (Cytoscape, yEd) have already been developed for network analysis; however, to our knowledge, there are no available GraphML compatible programs.
We designed and developed gViz, a GraphML network visualization and exploration tool. gViz is built on clustering coefficient-based algorithms and is a novel tool to visualize and manipulate networks of co-expression interactions among a selection of probesets (each representing a single gene or transcript), based on a set of microarray co-expression data stored as an adjacency matrix.
We present here gViz, a software tool designed to visualize and explore large GraphML networks, combining network theory, biological annotation data, microarray data analysis and advanced graphical features.
PMCID: PMC3214194  PMID: 22032859
24.  Inverse Current Source Density Method in Two Dimensions: Inferring Neural Activation from Multielectrode Recordings 
Neuroinformatics  2011;9(4):401-425.
The recent development of large multielectrode recording arrays has made it affordable for an increasing number of laboratories to record from multiple brain regions simultaneously. The development of analytical tools for array data, however, lags behind these technological advances in hardware. In this paper, we present a method based on forward modeling for estimating current source density from electrophysiological signals recorded on a two-dimensional grid using multi-electrode rectangular arrays. This new method, which we call two-dimensional inverse Current Source Density (iCSD 2D), is based upon and extends our previous one- and three-dimensional techniques. We test several variants of our method, both on surrogate data generated from a collection of Gaussian sources, and on model data from a population of layer 5 neocortical pyramidal neurons. We also apply the method to experimental data from the rat subiculum. The main advantages of the proposed method are the explicit specification of its assumptions, the possibility to include system-specific information as it becomes available, the ability to estimate CSD at the grid boundaries, and lower reconstruction errors when compared to the traditional approach. These features make iCSD 2D a substantial improvement over the approaches used so far and a powerful new tool for the analysis of multielectrode array data. We also provide a free GUI-based MATLAB toolbox to analyze and visualize our test data as well as user datasets.
Electronic supplementary material
The online version of this article (doi:10.1007/s12021-011-9111-4) contains supplementary material, which is available to authorized users.
PMCID: PMC3214268  PMID: 21409556
Current source density; Local field potentials; Evoked potentials; Inverse problems; Rat; Hippocampus; Subiculum; Cortical model
25.  VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data 
Microbiome  2015;3:1.
Metagenomics is limited in its ability to link distinct microbial populations to genetic potential due to a current lack of representative isolate genome sequences. Reference-independent approaches, which exploit for example inherent genomic signatures for the clustering of metagenomic fragments (binning), offer the prospect to resolve and reconstruct population-level genomic complements without the need for prior knowledge.
We present VizBin, a Java™-based application which offers efficient and intuitive reference-independent visualization of metagenomic datasets from single samples for subsequent human-in-the-loop inspection and binning. The method is based on nonlinear dimension reduction of genomic signatures and exploits the superior pattern recognition capabilities of the human eye-brain system for cluster identification and delineation. We demonstrate the general applicability of VizBin for the analysis of metagenomic sequence data by presenting results from two cellulolytic microbial communities and one human-borne microbial consortium. The superior performance of our application compared to other analogous metagenomic visualization and binning methods is also presented.
VizBin can be applied de novo for the visualization and subsequent binning of metagenomic datasets from single samples, and it can be used for the post hoc inspection and refinement of automatically generated bins. Due to its computational efficiency, it can be run on common desktop machines and enables the analysis of complex metagenomic datasets in a matter of minutes. The software implementation is available at under the BSD License (four-clause) and runs under Microsoft Windows™, Apple Mac OS X™ (10.7 to 10.10), and Linux.
Electronic supplementary material
The online version of this article (doi:10.1186/s40168-014-0066-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4305225  PMID: 25621171
Metagenomics; Machine learning; Visualization; Binning

Results 1-25 (1172833)