Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Use of autocorrelation scanning in DNA copy number analysis 
Bioinformatics  2013;29(21):2678-2682.
Motivation: Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses.
Method: We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes.
Results: Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3799475  PMID: 24045776
2.  PurityEst: estimating purity of human tumor samples using next-generation sequencing data 
Bioinformatics  2012;28(17):2265-2266.
Summary: We developed a novel algorithm, PurityEst, to infer the tumor purity level from the allelic differential representation of heterozygous loci with somatic mutations in a human tumor sample with a matched normal tissue using next-generation sequencing data. We applied our tool to a whole cancer genome sequencing datasets and demonstrated the accuracy of PurityEst compared with DNA copy number-based estimation.
Availability: PurityEst has been implemented in PERL and is available at
PMCID: PMC3426843  PMID: 22743227
3.  Serial dilution curve: a new method for analysis of reverse phase protein array data 
Bioinformatics  2009;25(5):650-654.
Reverse phase protein arrays (RPPAs) are a powerful high-throughput tool for measuring protein concentrations in a large number of samples. In RPPA technology, the original samples are often diluted successively multiple times, forming dilution series to extend the dynamic range of the measurements and to increase confidence in quantitation. An RPPA experiment is equivalent to running multiple ELISA assays concurrently except that there is usually no known protein concentration from which one can construct a standard response curve. Here, we describe a new method called ‘serial dilution curve for RPPA data analysis’. Compared with the existing methods, the new method has the advantage of using fewer parameters and offering a simple way of visualizing the raw data. We showed how the method can be used to examine data quality and to obtain robust quantification of protein concentrations.
Availability: A computer program in R for using serial dilution curve for RPPA data analysis is freely available at
PMCID: PMC2647837  PMID: 19176552
4.  VizStruct: exploratory visualization for gene expression profiling 
DNA arrays provide a broad snapshot of the state of the cell by measuring the expression levels of thousands of genes simultaneously. Visualization techniques can enable the exploration and detection of patterns and relationships in a complex data set by presenting the data in a graphical format in which the key characteristics become more apparent. The dimensionality and size of array data sets however present significant challenges to visualization. The purpose of this study is to present an interactive approach for visualizing variations in gene expression profiles and to assess its usefulness for classifying samples.
The first Fourier harmonic projection was used to map multi-dimensional gene expression data to two dimensions in an implementation called VizStruct. The visualization method was tested using the differentially expressed genes identified in eight separate gene expression data sets. The samples were classified using the oblique decision tree (OC1) algorithm to provide a procedure for visualization-driven classification. The classifiers were evaluated by the holdout and the cross-validation techniques. The proposed method was found to achieve high accuracy.
Detailed mathematical derivation of all mapping properties as well as figures in color can be found as supplementary on the web page All programs were written in Java and Matlab and software code is available by request from the first author.
PMCID: PMC2607484  PMID: 14693813

Results 1-4 (4)