The potential role of the cell-of-origin in determining the tumor phenotype has been raised, but not adequately examined. We hypothesized that distinct cells-of-origin may play a role in determining ovarian tumor phenotype and outcome. Here we describe a new cell culture medium for in vitro culture of paired normal human ovarian (OV) and fallopian tube (FT) epithelial cells from donors without cancer. While these cells have been cultured individually for short periods of time, to our knowledge this is the first long-term culture of both cell types from the same donors. Through analysis of the gene expression profiles of the cultured OV/FT cells we identified a normal cell-of-origin gene signature that classified primary ovarian cancers into OV-like and FT-like subgroups; this classification correlated with significant differences in clinical outcomes. The identification of a prognostically significant gene expression signature derived solely from normal untransformed cells is consistent with the hypothesis that the normal cell-of-origin may be a source of ovarian tumor heterogeneity and the associated differences in tumor outcome.
Over 20 million archival tissue samples are stored annually in the United States as formalin-fixed, paraffin-embedded (FFPE) blocks, but RNA degradation during fixation and storage has prevented their use for transcriptional profiling. New and highly sensitive assays for whole-transcriptome microarray analysis of FFPE tissues are now available, but resulting data include noise and variability for which previous expression array methods are inadequate.
We present the two largest whole-genome expression studies from FFPE tissues to date, comprising 1,003 colorectal cancer (CRC) and 168 breast cancer samples, combined with a meta-analysis of 14 new and published FFPE microarray datasets. We develop and validate quality control (QC) methods through technical replication, independent samples, comparison to results from fresh-frozen tissue, and recovery of expected associations between gene expression and protein abundance.
Archival tissues from large, multi-center studies demonstrated a much wider range of transcriptional data quality relative to smaller or frozen tissue studies and required stringent QC for subsequent analysis. We developed novel methods for such QC of archival tissue expression profiles based on sample dynamic range and per-study median profile. This enabled validated identification of gene signatures of microsatellite instability and additional features of CRC, and improved recovery of associations between gene expression and protein abundance of MLH1, FASN, CDX2, MGMT and SIRT1 in CRC tumors.
These methods for large-scale QC of FFPE expression profiles enable study of the cancer transcriptome in relation to extensive clinicopathological information, tumor molecular biomarkers, and long-term lifestyle and outcome data.
FFPE; formalin; gene expression profiling
Chronic obstructive pulmonary disease (COPD) involves aberrant airway inflammatory responses to cigarette smoke (CS) that are associated with epithelial cell dysfunction, cilia shortening, and mucociliary clearance disruption. Exposure to CS reduced cilia length and induced autophagy in vivo and in differentiated mouse tracheal epithelial cells (MTECs). Autophagy-impaired (Becn1+/– or Map1lc3B–/–) mice and MTECs resisted CS-induced cilia shortening. Furthermore, CS increased the autophagic turnover of ciliary proteins, indicating that autophagy may regulate cilia homeostasis. We identified cytosolic deacetylase HDAC6 as a critical regulator of autophagy-mediated cilia shortening during CS exposure. Mice bearing an X chromosome deletion of Hdac6 (Hdac6–/Y) and MTECs from these mice had reduced autophagy and were protected from CS-induced cilia shortening. Autophagy-impaired Becn1–/–, Map1lc3B–/–, and Hdac6–/Y mice or mice injected with an HDAC6 inhibitor were protected from CS-induced mucociliary clearance (MCC) disruption. MCC was preserved in mice given the chemical chaperone 4-phenylbutyric acid, but was disrupted in mice lacking the transcription factor NRF2, suggesting that oxidative stress and altered proteostasis contribute to the disruption of MCC. Analysis of human COPD specimens revealed epigenetic deregulation of HDAC6 by hypomethylation and increased protein expression in the airways. We conclude that an autophagy-dependent pathway regulates cilia length during CS exposure and has potential as a therapeutic target for COPD.
Diminished ovarian reserve (DOR) is a challenging diagnosis of infertility, as there are currently no tests to predict who may become affected with this condition, or at what age. We designed the present study to compare the gene expression profile of membrana granulosa cells from young women affected with DOR with those from egg donors of similar age and to determine if distinct genetic patterns could be identified to provide insight into the etiology of DOR. Young women with DOR were identified based on FSH level in conjunction with poor follicular development during an IVF cycle (n = 13). Egg donors with normal ovarian reserve (NOR) comprised the control group (n = 13). Granulosa cells were collected following retrieval, RNA was extracted and microarray analysis was conducted to evaluate genetic differences between the groups. Confirmatory studies were undertaken with quantitative RT–PCR (qRT–PCR). Multiple significant differences in gene expression were observed between the DOR patients and egg donors. Two genes linked with ovarian function, anti-Mullerian hormone (AMH) and luteinizing hormone receptor (LHCGR), were further analyzed with qRT–PCR in all patients. The average expression of AMH was significantly higher in egg donors (adjusted P-value = 0.01), and the average expression of LHCGR was significantly higher in DOR patients (adjusted P-value = 0.005). Expression levels for four additional genes, progesterone receptor membrane component 2 (PGRMC2), prostaglandin E receptor 3 (subtype EP3) (PTGER3), steroidogenic acute regulatory protein (StAR), and StAR-related lipid transfer domain containing 4 (StarD4), were validated in a group consisting of five NOR and five DOR patients. We conclude that gene expression analysis has substantial potential to determine which young women may be affected with DOR. More importantly, our analysis suggests that DOR patients fall into two distinct subgroups based on gene expression profiles, indicating that different mechanisms may be involved during development of this pathology.
granulosa cells; oocyte quality; diminished ovarian reserve; IVF; microarray analysis
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.
Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age), the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further validation may be valuable in the clinical management or treatment of ovarian cancer.
Despite widespread interest in the application of next-generation-sequencing (NGS) to the mutation profiling of individual cancer specimens, the onset of personalized clinical genomics is currently stalled due in part to technical hurdles. As tumors are genetically-heterogeneous and often mixed with normal/stromal cells, the resulting low-abundance DNA somatic mutations often produce ambiguous results or fall below the current NGS detection limit, thus hindering mutation calling that abides to clinical sensitivity/specificity standards. Here we examine the feasibility of applying COLD-PCR, a form of PCR that magnifies selectively the mutations, to boost the detection of unknown rare somatic mutations prior to applying NGS-based amplicon re-sequencing to clinical samples. We amplified DNA from serially-diluted mutation-containing human cell-lines into wild-type (WT) DNA, as well as lung adenocarcinoma and colorectal cancer specimens using COLD-PCR or conventional PCR for comparison. Following individual amplification of TP53, KRAS, IDH1, and EGFR regions, PCR products were barcoded, pooled for library preparation and sequenced on the Illumina-HiSeq2000 platform. Regardless of sequencing depth, sequencing errors dictated a mutation-detection limit of ~1–2% mutation abundance in conventional PCR amplicons analyzed by NGS. In contrast, COLD-PCR amplicons enabled genuine mutations to exceed the sequence noise levels, thus allowing reliable identification of mutation abundances of ~0.04%. Sequencing depth was not a significant factor in the identification of COLD-PCR-magnified mutations. The analyzed clinical specimens revealed several TP53 and KRAS missense mutations that could not be called following NGS of conventional amplicons, yet were clearly detectable in COLD-PCR amplicons. Extensive tumor heterogeneity in the TP53 gene was revealed in some samples. As cancer care shifts toward personalized intervention, based on the unique genetic abnormalities in each patient’s tumor genome, we anticipate that COLD-PCR-NGS will elucidate the role of rare mutations in tumors, enable NGS-based analysis of diverse clinical specimens and the broad inter-phasing of NGS with clinical practice.
COLD-PCR; mutation enrichment; low-abundance mutations; next generation sequencing; cancer
Single sample predictors (SSPs) and Subtype classification models (SCMs) are gene expression–based classifiers used to identify the four primary molecular subtypes of breast cancer (basal-like, HER2-enriched, luminal A, and luminal B). SSPs use hierarchical clustering, followed by nearest centroid classification, based on large sets of tumor-intrinsic genes. SCMs use a mixture of Gaussian distributions based on sets of genes with expression specifically correlated with three key breast cancer genes (estrogen receptor [ER], HER2, and aurora kinase A [AURKA]). The aim of this study was to compare the robustness, classification concordance, and prognostic value of these classifiers with those of a simplified three-gene SCM in a large compendium of microarray datasets.
Thirty-six publicly available breast cancer datasets (n = 5715) were subjected to molecular subtyping using five published classifiers (three SSPs and two SCMs) and SCMGENE, the new three-gene (ER, HER2, and AURKA) SCM. We used the prediction strength statistic to estimate robustness of the classification models, defined as the capacity of a classifier to assign the same tumors to the same subtypes independently of the dataset used to fit it. We used Cohen κ and Cramer V coefficients to assess concordance between the subtype classifiers and association with clinical variables, respectively. We used Kaplan–Meier survival curves and cross-validated partial likelihood to compare prognostic value of the resulting classifications. All statistical tests were two-sided.
SCMs were statistically significantly more robust than SSPs, with SCMGENE being the most robust because of its simplicity. SCMGENE was statistically significantly concordant with published SCMs (κ = 0.65–0.70) and SSPs (κ = 0.34–0.59), statistically significantly associated with ER (V = 0.64), HER2 (V = 0.52) status, and histological grade (V = 0.55), and yielded similar strong prognostic value.
Our results suggest that adequate classification of the major and clinically relevant molecular subtypes of breast cancer can be robustly achieved with quantitative measurements of three key genes.
A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that “random” gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random gene sets. SAPS ensures that a significant gene set is not only able to stratify patients into prognostically variable groups, but is also enriched for genes showing strong univariate associations with patient prognosis, and performs significantly better than random gene sets. We use SAPS to perform a large meta-analysis (the largest completed to date) of prognostic pathways in breast and ovarian cancer and their molecular subtypes. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we show that prognostic signatures in ER negative breast cancer are more similar to prognostic signatures in ovarian cancer than to prognostic signatures in ER positive breast cancer. SAPS is a powerful new method for deriving robust prognostic biological signatures from clinically annotated genomic datasets.
A major goal in biomedical research is to identify sets of genes (or “biological signatures”) associated with patient survival, as these genes could be targeted to aid in diagnosing and treating disease. A major challenge in using prognostic associations to identify biologically informative signatures is that in some diseases, “random” gene sets are associated with prognosis. To address this problem, we developed a new method called “Significance Analysis of Prognostic Signatures” (or “SAPS”) for the identification of biologically informative gene sets associated with patient survival. To test the effectiveness of SAPS, we use SAPS to perform a subtype-specific meta-analysis of prognostic signatures in large breast and ovarian cancer meta-data sets. This analysis represents the largest of its kind ever performed. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we demonstrate a striking similarity between prognostic pathways in ER negative breast cancer and ovarian cancer, suggesting new shared therapeutic targets for these aggressive malignancies. SAPS is a powerful new method for deriving robust prognostic biological pathways from clinically annotated genomic datasets.
Although microRNAs (miRNAs) are implicated in osteosarcoma biology and chemoresponse, miRNA prognostic models are still needed, particularly because prognosis is imperfectly correlated with chemoresponse. Formalin-fixed, paraffin-embedded tissue is a necessary resource for biomarker studies in this malignancy with limited frozen tissue availability.
We performed miRNA and mRNA microarray formalin-fixed, paraffin-embedded assays in 65 osteosarcoma biopsy and 26 paired post-chemotherapy resection specimens and used the only publicly available miRNA dataset, generated independently by another group, to externally validate our strongest findings (n = 29). We used supervised principal components analysis and logistic regression for survival and chemoresponse, and miRNA activity and target gene set analysis to study miRNA regulatory activity.
Several miRNA-based models with as few as five miRNAs were prognostic independently of pathologically assessed chemoresponse (median recurrence-free survival: 59 months versus not-yet-reached; adjusted hazards ratio = 2.90; P = 0.036). The independent dataset supported the reproducibility of recurrence and survival findings. The prognostic value of the profile was independent of confounding by known prognostic variables, including chemoresponse, tumor location and metastasis at diagnosis. Model performance improved when chemoresponse was added as a covariate (median recurrence-free survival: 59 months versus not-yet-reached; hazard ratio = 3.91; P = 0.002). Most prognostic miRNAs were located at 14q32 - a locus already linked to osteosarcoma - and their gene targets display deregulation patterns associated with outcome. We also identified miRNA profiles predictive of chemoresponse (75% to 80% accuracy), which did not overlap with prognostic profiles.
Formalin-fixed, paraffin-embedded tissue-derived miRNA patterns are a powerful prognostic tool for risk-stratified osteosarcoma management strategies. Combined miRNA and mRNA analysis supports a possible role of the 14q32 locus in osteosarcoma progression and outcome. Our study creates a paradigm for formalin-fixed, paraffin-embedded-based miRNA biomarker studies in cancer.
Summary: The survcomp package provides functions to assess and statistically compare the performance of survival/risk prediction models. It implements state-of-the-art statistics to (i) measure the performance of risk prediction models; (ii) combine these statistical estimates from multiple datasets using a meta-analytical framework; and (iii) statistically compare the performance of competitive models.
Availability: The R/Bioconductor package survcomp is provided open source under the Artistic-2.0 License with a user manual containing installation, operating instructions and use case scenarios on real datasets. survcomp requires R version 2.13.0 or higher. http://bioconductor.org/packages/release/bioc/html/survcomp.html
Contact: firstname.lastname@example.org; email@example.com
Supplementary Information: Supplementary data are available at Bioinformatics online.
Motivation: The ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data extends the utility of this powerful approach that has mainly been used for point or small insertion/deletion detection.
Results: We present ExomeCNV, a statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies, from mapped short sequence reads, and we assess both the method's power and the effects of confounding variables. We apply our method to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design.
Availability: CRAN package ‘ExomeCNV’.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
MicroRNAs (miRNAs) are nucleic acid regulators of many human mRNAs, and are associated with many tumorigenic processes. miRNA expression levels have been used in profiling studies, but some evidence suggests that expression levels do not fully capture miRNA regulatory activity. In this study we integrate multiple gene expression datasets to determine miRNA activity patterns associated with cancer phenotypes and oncogenic pathways in mesenchymal tumors – a very heterogeneous class of malignancies.
Using a computational method, we identified differentially activated miRNAs between 77 normal tissue specimens and 135 sarcomas and we validated many of these findings with microarray interrogation of an independent, paraffin-based cohort of 18 tumors. We also showed that miRNA activity is imperfectly correlated with miRNA expression levels. Using next-generation miRNA sequencing we identified potential base sequence alterations which may explain differential activity. We then analyzed miRNA activity changes related to the RAS-pathway and found 21 miRNAs that switch from silenced to activated status in parallel with RAS activation. Importantly, nearly half of these 21 miRNAs were predicted to regulate integral parts of the miRNA processing machinery, and our gene expression analysis revealed significant reductions of these transcripts in RAS-active tumors. These results suggest an association between RAS signaling and miRNA processing in which miRNAs may attenuate their own biogenesis.
Our study represents the first gene expression-based investigation of miRNA regulatory activity in human sarcomas, and our findings indicate that miRNA activity patterns derived from integrated transcriptomic data are reproducible and biologically informative in cancer. We identified an association between RAS signaling and miRNA processing, and demonstrated sequence alterations as plausible causes for differential miRNA activity. Finally, our study highlights the value of systems level integrative miRNA/mRNA assessment with high-throughput genomic data, and the applicability of paraffin-tissue-derived RNA for validation of novel findings.
MicroRNA; Microarray; RAS; Mesenchymal tumors; MicroRNA biogenesis
Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods.
Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes.
Availability: Implemented in the Bioconductor package iBBiG
Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia.
Many “virally implicated human diseases” - diseases for which there is scientific consensus of viral involvement - are associated with genetic alterations in particular disease susceptibility genes. We proposed and demonstrated that for two human viruses, Epstein-Barr virus and human papillomavirus, topological proximity should exist between host targets of viruses and genes associated with virally implicated diseases on host interactome networks (local impact hypothesis). For representative EBV- and HPV16- implicated diseases, genes in the neighborhood of viral targets in the host interactome have significantly shifted expression levels in virally implicated disease tissues, in line with the local impact hypothesis. The viral neighborhoods in the host interactome, along with their disease associations, defined as “viral disease networks”, contain connections known to be informative upon disease mechanisms as well as diseases whose associations with viruses are not yet known. We prioritized these diseases for their candidacy as potential virally implicated diseases based on network topology, and benchmarked this prioritization of candidate diseases using relative risk measurement which depicts population-based clinical associations between candidate diseases and viral infection. Exogenous expression of HPV viral proteins in a human cell line offered evidence for a novel disease pathway that links HPV to Fanconi anemia.
Ovarian cancer is the fifth leading cause of cancer death for women in the U.S. and the seventh most fatal worldwide. Although ovarian cancer is notable for its initial sensitivity to platinum-based therapies, the vast majority of patients eventually develop recurrent cancer and succumb to increasingly platinum-resistant disease. Modern, targeted cancer drugs intervene in cell signaling, and identifying key disease mechanisms and pathways would greatly advance our treatment abilities. In order to shed light on the molecular diversity of ovarian cancer, we performed comprehensive transcriptional profiling on 129 advanced stage, high grade serous ovarian cancers. We implemented a, re-sampling based version of the ISIS class discovery algorithm (rISIS: robust ISIS) and applied it to the entire set of ovarian cancer transcriptional profiles. rISIS identified a previously undescribed patient stratification, further supported by micro-RNA expression profiles, and gene set enrichment analysis found strong biological support for the stratification by extracellular matrix, cell adhesion, and angiogenesis genes. The corresponding “angiogenesis signature” was validated in ten published independent ovarian cancer gene expression datasets and is significantly associated with overall survival. The subtypes we have defined are of potential translational interest as they may be relevant for identifying patients who may benefit from the addition of anti-angiogenic therapies that are now being tested in clinical trials.
Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1) transforms rodent fibroblasts and is expressed in most EBV-associated malignancies. LMP1 (transformation effector site 2 [TES2]/C-terminal activation region 2 [CTAR2]) activates NF-κB, p38, Jun N-terminal protein kinase (JNK), extracellular signal-regulated kinase (ERK), and interferon regulatory factor 7 (IRF7) pathways. We have investigated LMP1 TES2 genome-wide RNA effects at 4 time points after LMP1 TES2 expression in HEK-293 cells. By using a false discovery rate (FDR) of <0.001 after correction for multiple hypotheses, LMP1 TES2 caused >2-fold changes in 1,916 mRNAs; 1,479 RNAs were upregulated and 437 were downregulated. In contrast to tumor necrosis factor alpha (TNF-α) stimulation, which transiently upregulates many target genes, LMP1 TES2 maintained most RNA effects through the time course, despite robust and sustained induction of negative feedback regulators, such as IκBα and A20. LMP1 TES2-regulated RNAs encode many NF-κB signaling proteins and secondary interacting proteins. Consequently, many LMP1 TES2-regulated RNAs encode proteins that form an extensive interactome. Gene set enrichment analyses found LMP1 TES2-upregulated genes to be significantly enriched for pathways in cancer, B- and T-cell receptor signaling, and Toll-like receptor signaling. Surprisingly, LMP1 TES2 and IκBα superrepressor coexpression decreased LMP1 TES2 RNA effects to only 5 RNAs, with FDRs of <0.001-fold and >2-fold changes. Thus, canonical NF-κB activation is critical for almost all LMP1 TES2 RNA effects in HEK-293 cells and a more significant therapeutic target than previously appreciated.
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time—sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.
Cyclin D1 is a component of the core cell cycle machinery1. Abnormally high levels of cyclin D1 are detected in many human cancer types2. To elucidate the molecular functions of cyclin D1 in human cancers, here we performed a proteomic screen for cyclin D1 protein partners in several types of human tumors. Analyses of cyclin D1-interactors revealed a network of DNA repair proteins, including RAD51, a recombinase that drives the homologous recombination process3. We found that cyclin D1 directly binds RAD51, and that cyclin D1-RAD51 interaction is induced by radiation. Like RAD51, cyclin D1 is recruited to DNA damage sites in a BRCA2-dependent fashion. Reduction of cyclin D1 levels in human cancer cells impaired recruitment of RAD51 to damaged DNA, impeded the homologous recombination-mediated DNA repair, and increased sensitivity of cells to radiation in vitro and in vivo. This effect was seen in cancer cells lacking the retinoblastoma protein, which do not require D-cyclins for proliferation4, 5. These findings reveal an unexpected function of a core cell cycle protein in DNA repair and suggest that targeting cyclin D1 may be beneficial also in retinoblastoma-negative cancers which are currently thought to be oblivious to cyclin D1 inhibition.
Traditional strategies for selecting variables in high dimensional classification problems aim to find sets of maximally relevant variables able to explain the target variations. If these techniques may be effective in generalization accuracy they often do not reveal direct causes. The latter is essentially related to the fact that high correlation (or relevance) does not imply causation. In this study, we show how to efficiently incorporate causal information into gene selection by moving from a single-input single-output to a multiple-input multiple-output setting.
We show in synthetic case study that a better prioritization of causal variables can be obtained by considering a relevance score which incorporates a causal term. In addition we show, in a meta-analysis study of six publicly available breast cancer microarray datasets, that the improvement occurs also in terms of accuracy. The biological interpretation of the results confirms the potential of a causal approach to gene selection.
Integrating causal information into gene selection algorithms is effective both in terms of prediction accuracy and biological interpretation.
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org.
Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these ‘known’ interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.