Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)
more »
Year of Publication
Document Types
1.  An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection 
Bioinformatics  2012;28(19):2534-2536.
Summary: With the rapid advances and prevalence of high-throughput genomic technologies, integrating information of multiple relevant genomic studies has brought new challenges. Microarray meta-analysis has become a frequently used tool in biomedical research. Little effort, however, has been made to develop a systematic pipeline and user-friendly software. In this article, we present MetaOmics, a suite of three R packages MetaQC, MetaDE and MetaPath, for quality control, differentially expressed gene identification and enriched pathway detection for microarray meta-analysis. MetaQC provides a quantitative and objective tool to assist study inclusion/exclusion criteria for meta-analysis. MetaDE and MetaPath were developed for candidate marker and pathway detection, which provide choices of marker detection, meta-analysis and pathway analysis methods. The system allows flexible input of experimental data, clinical outcome (case–control, multi-class, continuous or survival) and pathway databases. It allows missing values in experimental data and utilizes multi-core parallel computing for fast implementation. It generates informative summary output and visualization plots, operates on different operation systems and can be expanded to include new algorithms or combine different types of genomic data. This software suite provides a comprehensive tool to conveniently implement and compare various genomic meta-analysis pipelines.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3463115  PMID: 22863766
2.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles 
Bioinformatics  2010;27(1):78-86.
Motivation: Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation.
Methods: Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure.
Results: DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3008641  PMID: 21045072
3.  Module-based prediction approach for robust inter-study predictions in microarray data 
Bioinformatics  2010;26(20):2586-2593.
Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering.
Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2951088  PMID: 20719761
4.  Meta-analysis for pathway enrichment analysis when combining multiple genomic studies 
Bioinformatics  2010;26(10):1316-1323.
Motivation: Many pathway analysis (or gene set enrichment analysis) methods have been developed to identify enriched pathways under different biological states within a genomic study. As more and more microarray datasets accumulate, meta-analysis methods have also been developed to integrate information among multiple studies. Currently, most meta-analysis methods for combining genomic studies focus on biomarker detection and meta-analysis for pathway analysis has not been systematically pursued.
Results: We investigated two approaches of meta-analysis for pathway enrichment (MAPE) by combining statistical significance across studies at the gene level (MAPE_G) or at the pathway level (MAPE_P). Simulation results showed increased statistical power of meta-analysis approaches compared to a single study analysis and showed complementary advantages of MAPE_G and MAPE_P under different scenarios. We also developed an integrated method (MAPE_I) that incorporates advantages of both approaches. Comprehensive simulations and applications to real data on drug response of breast cancer cell lines and lung cancer tissues were evaluated to compare the performance of three MAPE variations. MAPE_P has the advantage of not requiring gene matching across studies. When MAPE_G and MAPE_P show complementary advantages, the hybrid version of MAPE_I is generally recommended.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2865865  PMID: 20410053
5.  Biomarker detection in the integration of multiple multi-class genomic studies 
Bioinformatics  2009;26(3):333-340.
Motivation: Systematic information integration of multiple-related microarray studies has become an important issue as the technology becomes mature and prevalent in the past decade. The aggregated information provides more robust and accurate biomarker detection. So far, published meta-analysis methods for this purpose mostly consider two-class comparison. Methods for combining multi-class studies and considering expression pattern concordance are rarely explored.
Results: In this article, we develop three integration methods for biomarker detection in multiple multi-class microarray studies: ANOVA-maxP, min-MCC and OW-min-MCC. We first consider a natural extension of combining P-values from the traditional ANOVA model. Since P-values from ANOVA do not guarantee to reflect the concordant expression pattern information across studies, we propose a multi-class correlation (MCC) measure to specifically seek for biomarkers of concordant inter-class patterns across a pair of studies. For both ANOVA and MCC approaches, we use extreme order statistics to identify biomarkers differentially expressed (DE) in all studies (i.e. ANOVA-maxP and min-MCC). The min-MCC method is further extended to identify biomarkers DE in partial studies by incorporating a recently developed optimally weighted (OW) technique (OW-min-MCC). All methods are evaluated by simulation studies and by three meta-analysis applications to multi-tissue mouse metabolism datasets, multi-condition mouse trauma datasets and multi-malignant-condition human prostate cancer datasets. The results show complementary strength of the three methods for different biological purposes.
Supplementary information: Supplementary data is available at Bioinformatics online.
PMCID: PMC2815659  PMID: 19965884
6.  Ratio adjustment and calibration scheme for gene-wise normalization to enhance microarray inter-study prediction 
Bioinformatics  2009;25(13):1655-1661.
Motivation: Reproducibility analyses of biologically relevant microarray studies have mostly focused on overlap of detected biomarkers or correlation of differential expression evidences across studies. For clinical utility, direct inter-study prediction (i.e. to establish a prediction model in one study and apply to another) for disease diagnosis or prognosis prediction is more important. Normalization plays a key role for such a task. Traditionally, sample-wise normalization has been a standard for inter-array and inter-study normalization. For gene-wise normalization, it has been implemented for intra-study or inter-study predictions in a few papers while its rationale, strategy and effect remain unexplored.
Results: In this article, we investigate the effect of gene-wise normalization in microarray inter-study prediction. Gene-specific intensity discrepancies across studies are commonly found even after proper sample-wise normalization. We explore the rationale and necessity of gene-wise normalization. We also show that the ratio of sample sizes in normal versus diseased groups can greatly affect the performance of gene-wise normalization and an analytical method is developed to adjust for the imbalanced ratio effect. Both simulation results and applications to three lung cancer and two prostate cancer data sets, considering both binary classification and survival risk predictions, showed significant and robust improvement of the new adjustment. A calibration scheme is developed to apply the ratio-adjusted gene-wise normalization for prospective clinical trials. The number of calibration samples needed is estimated from existing studies and suggested for future applications. The result has important implication to the translational research of microarray as a practical disease diagnosis and prognosis prediction tool.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2732320  PMID: 19414534
7.  Smarter clustering methods for SNP genotype calling 
Bioinformatics  2008;24(23):2665-2671.
Motivation: Most genotyping technologies for single nucleotide polymorphism (SNP) markers use standard clustering methods to ‘call’ the SNP genotypes. These methods are not always optimal in distinguishing the genotype clusters of a SNP because they do not take advantage of specific features of the genotype calling problem. In particular, when family data are available, pedigree information is ignored. Furthermore, prior information about the distribution of the measurements for each cluster can be used to choose an appropriate model-based clustering method and can significantly improve the genotype calls. One special genotyping problem that has never been discussed in the literature is that of genotyping of trisomic individuals, such as individuals with Down syndrome. Calling trisomic genotypes is a more complicated problem, and the addition of external information becomes very important.
Results: In this article, we discuss the impact of incorporating external information into clustering algorithms to call the genotypes for both disomic and trisomic data. We also propose two new methods to call genotypes using family data. One is a modification of the K-means method and uses the pedigree information by updating all members of a family together. The other is a likelihood-based method that combines the Gaussian or beta-mixture model with pedigree information. We compare the performance of these two methods and some other existing methods using simulation studies. We also compare the performance of these methods on a real dataset generated by the Illumina platform (
Availability: The R code for the family-based genotype calling methods (SNPCaller) is available to be downloaded from the following website:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2732271  PMID: 18826959

Results 1-7 (7)