Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  A pan-cancer proteomic perspective on The Cancer Genome Atlas 
Nature communications  2014;5:3887.
Protein levels and function are poorly predicted by genomic and transcriptomic analysis of patient tumors. Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects. Here we use reverse-phase protein arrays to analyze 3,467 patient samples from 11 TCGA “Pan-Cancer” diseases, using 181 high-quality antibodies that target 128 total proteins and 53 post-translationally modified proteins. The resultant proteomic data is integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumor lineages. In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumor lineages. This integrative analysis, with an emphasis on pathways and potentially actionable proteins, provides a framework for determining the prognostic, predictive and therapeutic relevance of the functional proteome.
PMCID: PMC4109726  PMID: 24871328
Proteomics; TCGA; Pan-Cancer; protein expression; protein networks
3.  Bagged gene shaving for the robust clustering of high-throughput data 
The analysis of high-throughput data sets, such as microarray data, often requires that individual variables (genes, for example) be grouped into clusters of variables with highly correlated values across all samples. Gene shaving is an established method for generating such clusters, but is overly sensitive to the input data: changing just one sample can determine whether or not an entire cluster is found. This paper describes a clustering method based on the bootstrap aggregation of gene shaving clusters, which overcomes this and other problems, and applies the new method to a large gene expression microarray dataset from brain tumour samples.
PMCID: PMC3879957  PMID: 20940121
bootstrap aggregation; clustering; gene shaving; glioblastoma
5.  Modulating microtubule stability enhances the cytotoxic response of cancer cells to paclitaxel 
Cancer research  2011;71(17):5806-5817.
The extracellular matrix protein TGFBI enhances the cytotoxic response of cancer cells to paclitaxel by affecting integrin signals that stabilize microtubules. Extending the implications of this knowledge, we tested the more general hypothesis that cancer cell signals which increase microtubule stability before exposure to paclitaxel may increase its ability to stablize microtubules and thereby enhance its cytotoxicity. Toward this end, we performed an siRNA screen to evaluate how genetic depletion affected microtubule stabilization, cell viability and apoptosis. High content microscopical analysis was performed in the absence or presence of paclitaxel. Kinase knockdowns that stabilized microtubules strongly enhanced the effects of paclitaxel treatment. Conversely, kinase knockdowns that enhanced paclitaxel-mediated cytotoxicity sensitized cells to microtubule stabilization by paclitaxel. The siRNA screen identified several genes that have not been linked previously to in microtubule regulation or paclitaxel response. Gene shaving and Bayesian resampling used to classify these genes suggested three pathways of paclitaxel-induced cell death related to apoptosis and microtubule stability, apoptosis alone, or neither process. Our results offer a functional classification of the genetic basis for paclitaxel sensitivity and they support the hypothesis that stablizing microtubules prior to therapy could enhance antitumor responses to paclitaxel treatment.
PMCID: PMC3679477  PMID: 21775522
Microtubule stability; paclitaxel; ovarian cancer; targeted therapy; antimitotic therapy
6.  Gene Expression Profiling of Ampullary Carcinomas Classifies Ampullary Carcinomas into Biliary-Like and Intestinal-Like Subtypes That Are Prognostic of Outcome 
PLoS ONE  2013;8(6):e65144.
Adenocarcinomas of the ampulla of Vater are classified as biliary cancers, though the exact epithelium of origin for these cancers is not known. We sought to molecularly classify ampullary adenocarcinomas in comparison to known adenocarcinomas of the pancreas, bile duct, and duodenum by gene expression analysis.
We analyzed 32 fresh-frozen resected, untreated periampullary adenocarcinomas (8 pancreatic, 2 extrahepatic biliary, 8 duodenal, and 14 ampullary) using the Affymetrix U133 Plus 2.0 genome array. Unsupervised and supervised hierarchical clustering identified two subtypes of ampullary carcinomas that were molecularly and histologically characterized.
Hierarchical clustering of periampullary carcinomas segregated ampullary carcinomas into two subgroups, which were distinctly different from pancreatic carcinomas. Non-pancreatic periampullary adenocarcinomas were segregated into two subgroups with differing prognoses: 5 year RFS (77% vs. 0%, p = 0.007) and 5 year OS (100% vs. 35%, p = 0.005). Unsupervised clustering analysis of the 14 ampullary samples also identified two subgroups: a good prognosis intestinal-like subgroup and a poor prognosis biliary-like subgroup with 5 year OS of 70% vs. 28%, P = 0.09. Expression of CK7+/CK20- but not CDX-2 correlated with these two subgroups. Activation of the AKT and MAPK pathways were both increased in the poor prognostic biliary-like subgroup. In an independent 80 patient ampullary validation dataset only histological subtype (intestinal vs. pancreaticobiliary) was significantly associated with OS in both univariate (p = 0.006) and multivariate analysis (P = 0.04).
Gene expression analysis discriminated pancreatic adenocarcinomas from other periampullary carcinomas and identified two prognostically relevant subgroups of ampullary adenocarcinomas. Histological subtype was an independent prognostic factor in ampullary adenocarcinomas.
PMCID: PMC3679143  PMID: 23776447
7.  iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data 
Bioinformatics  2012;29(2):149-159.
Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current data integration approaches are limited in that they do not consider the fundamental biological relationships that exist among the data obtained from different platforms.
Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses hierarchical modeling to combine the data obtained from multiple platforms into one model.
Results: We assess the performance of our methods using several synthetic and real examples. Simulations show our integrative methods to have higher power to detect disease-related genes than non-integrative methods. Using the Cancer Genome Atlas glioblastoma dataset, we apply the iBAG model to integrate gene expression and methylation data to study their associations with patient survival. Our proposed method discovers multiple methylation-regulated genes that are related to patient survival, most of which have important biological functions in other diseases but have not been previously studied in glioblastoma.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3546799  PMID: 23142963
8.  Model averaging strategies for structure learning in Bayesian networks with limited data 
BMC Bioinformatics  2012;13(Suppl 13):S10.
Considerable progress has been made on algorithms for learning the structure of Bayesian networks from data. Model averaging by using bootstrap replicates with feature selection by thresholding is a widely used solution for learning features with high confidence. Yet, in the context of limited data many questions remain unanswered. What scoring functions are most effective for model averaging? Does the bias arising from the discreteness of the bootstrap significantly affect learning performance? Is it better to pick the single best network or to average multiple networks learnt from each bootstrap resample? How should thresholds for learning statistically significant features be selected?
The best scoring functions are Dirichlet Prior Scoring Metric with small λ and the Bayesian Dirichlet metric. Correcting the bias arising from the discreteness of the bootstrap worsens learning performance. It is better to pick the single best network learnt from each bootstrap resample. We describe a permutation based method for determining significance thresholds for feature selection in bagged models. We show that in contexts with limited data, Bayesian bagging using the Dirichlet Prior Scoring Metric (DPSM) is the most effective learning strategy, and that modifying the scoring function to penalize complex networks hampers model averaging. We establish these results using a systematic study of two well-known benchmarks, specifically ALARM and INSURANCE. We also apply our network construction method to gene expression data from the Cancer Genome Atlas Glioblastoma multiforme dataset and show that survival is related to clinical covariates age and gender and clusters for interferon induced genes and growth inhibition genes.
For small data sets, our approach performs significantly better than previously published methods.
PMCID: PMC3426799  PMID: 23320818
9.  Bayesian ensemble methods for survival prediction in gene expression data 
Bioinformatics  2010;27(3):359-367.
Motivation: We propose a Bayesian ensemble method for survival prediction in high-dimensional gene expression data. We specify a fully Bayesian hierarchical approach based on an ensemble ‘sum-of-trees’ model and illustrate our method using three popular survival models. Our non-parametric method incorporates both additive and interaction effects between genes, which results in high predictive accuracy compared with other methods. In addition, our method provides model-free variable selection of important prognostic markers based on controlling the false discovery rates; thus providing a unified procedure to select relevant genes and predict survivor functions.
Results: We assess the performance of our method several simulated and real microarray datasets. We show that our method selects genes potentially related to the development of the disease as well as yields predictive performance that is very competitive to many other existing methods.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3031034  PMID: 21148161
10.  Selective Genomic Copy Number Imbalances and Probability of Recurrence in Early-Stage Breast Cancer 
PLoS ONE  2011;6(8):e23543.
A number of studies of copy number imbalances (CNIs) in breast tumors support associations between individual CNIs and patient outcomes. However, no pattern or signature of CNIs has emerged for clinical use. We determined copy number (CN) gains and losses using high-density molecular inversion probe (MIP) arrays for 971 stage I/II breast tumors and applied a boosting strategy to fit hazards models for CN and recurrence, treating chromosomal segments in a dose-specific fashion (-1 [loss], 0 [no change] and +1 [gain]). The concordance index (C-Index) was used to compare prognostic accuracy between a training (n = 728) and test (n = 243) set and across models. Twelve novel prognostic CNIs were identified: losses at 1p12, 12q13.13, 13q12.3, 22q11, and Xp21, and gains at 2p11.1, 3q13.12, 10p11.21, 10q23.1, 11p15, 14q13.2-q13.3, and 17q21.33. In addition, seven CNIs previously implicated as prognostic markers were selected: losses at 8p22 and 16p11.2 and gains at 10p13, 11q13.5, 12p13, 20q13, and Xq28. For all breast cancers combined, the final full model including 19 CNIs, clinical covariates, and tumor marker-approximated subtypes (estrogen receptor [ER], progesterone receptor, ERBB2 amplification, and Ki67) significantly outperformed a model containing only clinical covariates and tumor subtypes (C-Index full model, train[test]  =  0.72[0.71] ± 0.02 vs. C-Index clinical + subtype model, train[test]  =  0.62[0.62] ± 0.02; p<10−6). In addition, the full model containing 19 CNIs significantly improved prognostication separately for ER–, HER2+, luminal B, and triple negative tumors over clinical variables alone. In summary, we show that a set of 19 CNIs discriminates risk of recurrence among early-stage breast tumors, independent of ER status. Further, our data suggest the presence of specific CNIs that promote and, in some cases, limit tumor spread.
PMCID: PMC3155554  PMID: 21858162
11.  Gene expression meta-analysis supports existence of molecular apocrine breast cancer with a role for androgen receptor and implies interactions with ErbB family 
BMC Medical Genomics  2009;2:59.
Pathway discovery from gene expression data can provide important insight into the relationship between signaling networks and cancer biology. Oncogenic signaling pathways are commonly inferred by comparison with signatures derived from cell lines. We use the Molecular Apocrine subtype of breast cancer to demonstrate our ability to infer pathways directly from patients' gene expression data with pattern analysis algorithms.
We combine data from two studies that propose the existence of the Molecular Apocrine phenotype. We use quantile normalization and XPN to minimize institutional bias in the data. We use hierarchical clustering, principal components analysis, and comparison of gene signatures derived from Significance Analysis of Microarrays to establish the existence of the Molecular Apocrine subtype and the equivalence of its molecular phenotype across both institutions. Statistical significance was computed using the Fasano & Franceschini test for separation of principal components and the hypergeometric probability formula for significance of overlap in gene signatures. We perform pathway analysis using LeFEminer and Backward Chaining Rule Induction to identify a signaling network that differentiates the subset. We identify a larger cohort of samples in the public domain, and use Gene Shaving and Robust Bayesian Network Analysis to detect pathways that interact with the defining signal.
We demonstrate that the two separately introduced ER- breast cancer subsets represent the same tumor type, called Molecular Apocrine breast cancer. LeFEminer and Backward Chaining Rule Induction support a role for AR signaling as a pathway that differentiates this subset from others. Gene Shaving and Robust Bayesian Network Analysis detect interactions between the AR pathway, EGFR trafficking signals, and ErbB2.
We propose criteria for meta-analysis that are able to demonstrate statistical significance in establishing molecular equivalence of subsets across institutions. Data mining strategies used here provide an alternative method to comparison with cell lines for discovering seminal pathways and interactions between signaling networks. Analysis of Molecular Apocrine breast cancer implies that therapies targeting AR might be hampered if interactions with ErbB family members are not addressed.
PMCID: PMC2753593  PMID: 19747394
12.  A Semantic Web Management Model for Integrative Biomedical Informatics 
PLoS ONE  2008;3(8):e2946.
Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.
Methodology/Principal Findings
The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MDAnderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.
The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
PMCID: PMC2491554  PMID: 18698353

Results 1-12 (12)