The nucleic acid-binding protein YB-1, a member of the cold-shock domain protein family, has been implicated in the progression of breast cancer and is associated with poor patient survival. YB-1 has sequence similarity to LIN28, another cold-shock protein family member, which has a role in the regulation of small noncoding RNAs (sncRNAs) including microRNAs (miRNAs). Therefore, to investigate whether there is an association between YB-1 and sncRNAs in breast cancer, we investigated whether sncRNAs were bound by YB-1 in two breast cancer cell lines (luminal A-like and basal cell-like), and whether the abundance of sncRNAs and mRNAs changed in response to experimental reduction of YB-1 expression.
RNA-immunoprecipitation with an anti-YB-1 antibody showed that several sncRNAs are bound by YB-1. Some of these were bound by YB-1 in both breast cancer cell lines; others were cell-line specific. The small RNAs bound by YB-1 were derived from various sncRNA families including miRNAs such as let-7 and miR-320, transfer RNAs, ribosomal RNAs and small nucleolar RNAs (snoRNA). Reducing YB-1 expression altered the abundance of a number of transcripts encoding miRNA biogenesis and processing proteins but did not alter the abundance of mature or precursor miRNAs.
YB-1 binds to specific miRNAs, snoRNAs and tRNA-derived fragments and appears to regulate the expression of miRNA biogenesis and processing machinery. We propose that some of the oncogenic effects of YB-1 in breast cancer may be mediated through its interactions with sncRNAs.
We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data.
Many studies have revealed correlations between breast tumour phenotypes, variations in gene expression, and patient survival outcomes. The molecular heterogeneity between breast tumours revealed by these studies has allowed prediction of prognosis and has underpinned stratified therapy, where groups of patients with particular tumour types receive specific treatments. The molecular tests used to predict prognosis and stratify treatment usually utilise fixed sets of genomic biomarkers, with the same biomarker sets being used to test all patients. In this paper we suggest that instead of fixed sets of genomic biomarkers, it may be more effective to use a stratified biomarker approach, where optimal biomarker sets are automatically chosen for particular patient groups, analogous to the choice of optimal treatments for groups of similar patients in stratified therapy. We illustrate the effectiveness of a biclustering approach to select optimal gene sets for determining the prognosis of specific strata of patients, based on potentially overlapping, non-discrete molecular characteristics of tumours.
Biclustering identified tightly co-expressed gene sets in the tumours of restricted subgroups of breast cancer patients. The co-expressed genes in these biclusters were significantly enriched for particular biological annotations and gene regulatory modules associated with breast cancer biology. Tumours identified within the same bicluster were more likely to present with similar clinical features. Bicluster membership combined with clinical information could predict patient prognosis in conditional inference tree and ridge regression class prediction models.
The increasing clinical use of genomic profiling demands identification of more effective methods to segregate patients into prognostic and treatment groups. We have shown that biclustering can be used to select optimal gene sets for determining the prognosis of specific strata of patients.
Biclustering; Gene expression profiles; Tumour classification; Survival prediction; Breast cancer
Despite on-going research, metastatic melanoma survival rates remain low and treatment options are limited. Researchers can now access a rapidly growing amount of molecular and clinical information about melanoma. This information is becoming difficult to assemble and interpret due to its dispersed nature, yet as it grows it becomes increasingly valuable for understanding melanoma. Integration of this information into a comprehensive resource to aid rational experimental design and patient stratification is needed. As an initial step in this direction, we have assembled a web-accessible melanoma database, MelanomaDB, which incorporates clinical and molecular data from publically available sources, which will be regularly updated as new information becomes available. This database allows complex links to be drawn between many different aspects of melanoma biology: genetic changes (e.g., mutations) in individual melanomas revealed by DNA sequencing, associations between gene expression and patient survival, data concerning drug targets, biomarkers, druggability, and clinical trials, as well as our own statistical analysis of relationships between molecular pathways and clinical parameters that have been produced using these data sets. The database is freely available at http://genesetdb.auckland.ac.nz/melanomadb/about.html. A subset of the information in the database can also be accessed through a freely available web application in the Illumina genomic cloud computing platform BaseSpace at http://www.biomatters.com/apps/melanoma-profiler-for-research. The MelanomaDB database illustrates dysregulation of specific signaling pathways across 310 exome-sequenced melanomas and in individual tumors and identifies the distribution of somatic variants in melanoma. We suggest that MelanomaDB can provide a context in which to interpret the tumor molecular profiles of individual melanoma patients relative to biological information and available drug therapies.
melanoma; mutation; molecular pathway; MelanomaDB; gene set analysis; BaseSpace
Despite their distinct biology, granulosa cell tumours (GCTs) are treated the same as other ovarian tumours. Intriguingly, a recurring somatic mutation in the transcription factor Forkhead Box L2 (FOXL2) 402C>G has been found in nearly all GCTs examined. This investigation aims to identify the pathogenicity of mutant FOXL2 by studying its altered transcriptional targets.
The expression of mutant FOXL2 was reduced in the GCT cell line KGN, and wildtype and mutant FOXL2 were overexpressed in the GCT cell line COV434. Total RNA was hybridised to Affymetrix U133 Plus 2 microarrays. Comparisons were made between the transcriptomes of control cells and cells altered by FOXL2 knockdown and overexpression, to detect potential transcriptional targets of mutant FOXL2.
The overexpression of wildtype and mutant FOXL2 in COV434, and the silencing of mutant FOXL2 expression in KGN, has shown that mutant FOXL2 is able to differentially regulate the expression of many genes, including two well known FOXL2 targets, StAR and CYP19A. We have shown that many of the genes regulated by mutant FOXL2 are clustered into functional annotations of cell death, proliferation, and tumourigenesis. Furthermore, TGF-β signalling was found to be enriched when using the gene annotation tools GATHER and GeneSetDB. This enrichment was still significant after performing a robust permutation analysis.
Given that many of the transcriptional targets of mutant FOXL2 are known TGF-β signalling genes, we suggest that deregulation of this key antiproliferative pathway is one way mutant FOXL2 contributes to the pathogenesis of adult-type GCTs. We believe this pathway should be a target for future therapeutic interventions, if outcomes for women with GCTs are to improve.
The multi-subunit protein complex, cohesin, is responsible for sister chromatid cohesion during cell division. The interaction of cohesin with DNA is controlled by a number of additional regulatory proteins. Mutations in cohesin, or its regulators, cause a spectrum of human developmental syndromes known as the “cohesinopathies.” Cohesinopathy disorders include Cornelia de Lange Syndrome and Roberts Syndrome. The discovery of novel roles for chromatid cohesion proteins in regulating gene expression led to the idea that cohesinopathies are caused by dysregulation of multiple genes downstream of mutations in cohesion proteins. Consistent with this idea, Drosophila, mouse, and zebrafish cohesinopathy models all show altered expression of developmental genes. However, there appears to be incomplete overlap among dysregulated genes downstream of mutations in different components of the cohesion apparatus. This is surprising because mutations in all cohesion proteins would be predicted to affect cohesin’s roles in cell division and gene expression in similar ways. Here we review the differences and similarities between genetic pathways downstream of components of the cohesion apparatus, and discuss how such differences might arise, and contribute to the spectrum of cohesinopathy disorders. We propose that mutations in different elements of the cohesion apparatus have distinct developmental outcomes that can be explained by sometimes subtly different molecular effects.
cohesin; gene expression regulation; animal models; CdLS; RBS
Contact between sister chromatids from S phase to anaphase depends on cohesin, a large multi-subunit protein complex. Mutations in sister chromatid cohesion proteins underlie the human developmental condition, Cornelia de Lange Syndrome. Roles for cohesin in regulating gene expression, sometimes in combination with CCCTC-binding factor (CTCF), have emerged. We analyzed zebrafish embryos null for cohesin subunit rad21 using microarrays to determine global effects of cohesin on gene expression during embryogenesis. This identified Rad21-associated gene networks that included myca (zebrafish c-myc), p53 and mdm2. In zebrafish, cohesin binds to the transcription start sites of p53 and mdm2, and depletion of either Rad21 or CTCF increased their transcription. In contrast, myca expression was strongly downregulated upon loss of Rad21 while depletion of CTCF had little effect. Depletion of Rad21 or the cohesin-loading factor Nipped-B in Drosophila cells also reduced expression of myc and Myc target genes. Cohesin bound the transcription start site plus an upstream predicted CTCF binding site at zebrafish myca. Binding and positive regulation of the c-Myc gene by cohesin is conserved through evolution, indicating this regulation is likely to be direct. The exact mechanism of regulation is unknown, but local changes in histone modification associated with transcription repression at the myca gene were observed in rad21 mutants.
Cohesin; Zebrafish; Cornelia de Lange Syndrome; Myc
The human developmental diseases Cornelia de Lange Syndrome (CdLS) and Roberts Syndrome (RBS) are both caused by mutations in proteins responsible for sister chromatid cohesion. Cohesion is mediated by a multi-subunit complex called cohesin, which is loaded onto chromosomes by NIPBL. Once on chromosomes, cohesin binding is stabilized in S phase upon acetylation by ESCO2. CdLS is caused by heterozygous mutations in NIPBL or cohesin subunits SMC1A and SMC3, and RBS is caused by homozygous mutations in ESCO2. The genetic cause of both CdLS and RBS reside within the chromosome cohesion apparatus, and therefore they are collectively known as “cohesinopathies”. However, the two syndromes have distinct phenotypes, with differences not explained by their shared ontology. In this study, we have used the zebrafish model to distinguish between developmental pathways downstream of cohesin itself, or its acetylase ESCO2. Esco2 depleted zebrafish embryos exhibit features that resemble RBS, including mitotic defects, craniofacial abnormalities and limb truncations. A microarray analysis of Esco2-depleted embryos revealed that different subsets of genes are regulated downstream of Esco2 when compared with cohesin subunit Rad21. Genes downstream of Rad21 showed significant enrichment for transcriptional regulators, while Esco2-regulated genes were more likely to be involved the cell cycle or apoptosis. RNA in situ hybridization showed that runx1, which is spatiotemporally regulated by cohesin, is expressed normally in Esco2-depleted embryos. Furthermore, myca, which is downregulated in rad21 mutants, is upregulated in Esco2-depleted embryos. High levels of cell death contributed to the morphology of Esco2-depleted embryos without affecting specific developmental pathways. We propose that cell proliferation defects and apoptosis could be the primary cause of the features of RBS. Our results show that mutations in different elements of the cohesion apparatus have distinct developmental outcomes, and provide insight into why CdLS and RBS are distinct diseases.
Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge. Therefore, a reverse strategy, which identifies functionally important SNPs by virtue of the bimodal abundance across the human population of the SNP-related mRNAs will be useful. Those mRNA transcripts that are expressed at two distinct abundances in proportion to SNP allele frequency may warrant further study. Matrix metalloproteinase 1 (MMP1) is important in both normal development and in numerous pathologies. Although much research has been conducted to investigate the expression of MMP1 in many different cell types and conditions, the regulation of its expression is still not fully understood.
In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals. We found that MMP1 mRNA abundance was bimodally distributed in un-treated HUVECs and showed a bimodal response to inflammatory mediator treatment. RT-PCR and MMP1 activity assays confirmed the bimodal regulation and DNA sequencing of 69 individuals identified an MMP1 gene promoter polymorphism that segregated precisely with the MMP1 bimodal expression. Chromatin immunoprecipation (ChIP) experiments indicated that the transcription factors (TFs) ETS1, ETS2 and GATA3, bind to the MMP1 promoter in the region of this polymorphism and may contribute to the bimodal expression.
We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticans to understand and use. This method identified bimodal endothelial cell expression of MMP1, which appears to be biologically significant with implications for inflammatory disease. (271 Words)
Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools.
We are investigating the molecular basis of melanoma by defining genomic characteristics that correlate with tumour phenotype in a novel panel of metastatic melanoma cell lines. The aim of this study is to identify new prognostic markers and therapeutic targets that might aid clinical cancer diagnosis and management.
Global transcript profiling identified a signature featuring decreased expression of developmental and lineage specification genes including MITF, EDNRB, DCT, and TYR, and increased expression of genes involved in interaction with the extracellular environment, such as PLAUR, VCAN, and HIF1a. Migration assays showed that the gene signature correlated with the invasive potential of the cell lines, and external validation by using publicly available data indicated that tumours with the invasive gene signature were less melanocytic and may be more aggressive. The invasion signature could be detected in both primary and metastatic tumours suggesting that gene expression conferring increased invasive potential in melanoma may occur independently of tumour stage.
Our data supports the hypothesis that differential developmental gene expression may drive invasive potential in metastatic melanoma, and that melanoma heterogeneity may be explained by the differing capacity of melanoma cells to both withstand decreased expression of lineage specification genes and to respond to the tumour microenvironment. The invasion signature may provide new possibilities for predicting which primary tumours are more likely to metastasize, and which metastatic tumours might show a more aggressive clinical course.
Zinc finger nucleases (ZFN) are powerful tools for editing genes in cells. Here we use ZFNs to interrogate the biological function of ADPGK, which encodes an ADP-dependent glucokinase (ADPGK), in human tumour cell lines. The hypothesis we tested is that ADPGK utilises ADP to phosphorylate glucose under conditions where ATP becomes limiting, such as hypoxia. We characterised two ZFN knockout clones in each of two lines (H460 and HCT116). All four clones had frameshift mutations in all alleles at the target site in exon 1 of ADPGK, and were ADPGK-null by immunoblotting. ADPGK knockout had little or no effect on cell proliferation, but compromised the ability of H460 cells to survive siRNA silencing of hexokinase-2 under oxic conditions, with clonogenic survival falling from 21±3% for the parental line to 6.4±0.8% (p = 0.002) and 4.3±0.8% (p = 0.001) for the two knockouts. A similar increased sensitivity to clonogenic cell killing was observed under anoxia. No such changes were found when ADPGK was knocked out in HCT116 cells, for which the parental line was less sensitive than H460 to anoxia and to hexokinase-2 silencing. While knockout of ADPGK in HCT116 cells caused few changes in global gene expression, knockout of ADPGK in H460 cells caused notable up-regulation of mRNAs encoding cell adhesion proteins. Surprisingly, we could discern no consistent effect on glycolysis as measured by glucose consumption or lactate formation under anoxia, or extracellular acidification rate (Seahorse XF analyser) under oxic conditions in a variety of media. However, oxygen consumption rates were generally lower in the ADPGK knockouts, in some cases markedly so. Collectively, the results demonstrate that ADPGK can contribute to tumour cell survival under conditions of high glycolytic dependence, but the phenotype resulting from knockout of ADPGK is cell line dependent and appears to be unrelated to priming of glycolysis in these lines.
Our understanding of the molecular pathways that underlie melanoma remains incomplete. Although several published microarray studies of clinical melanomas have provided valuable information, we found only limited concordance between these studies. Therefore, we took an in vitro functional genomics approach to understand melanoma molecular pathways.
Affymetrix microarray data were generated from A375 melanoma cells treated in vitro with siRNAs against 45 transcription factors and signaling molecules. Analysis of this data using unsupervised hierarchical clustering and Bayesian gene networks identified proliferation-association RNA clusters, which were co-ordinately expressed across the A375 cells and also across melanomas from patients. The abundance in metastatic melanomas of these cellular proliferation clusters and their putative upstream regulators was significantly associated with patient prognosis. An 8-gene classifier derived from gene network hub genes correctly classified the prognosis of 23/26 metastatic melanoma patients in a cross-validation study. Unlike the RNA clusters associated with cellular proliferation described above, co-ordinately expressed RNA clusters associated with immune response were clearly identified across melanoma tumours from patients but not across the siRNA-treated A375 cells, in which immune responses are not active. Three uncharacterised genes, which the gene networks predicted to be upstream of apoptosis- or cellular proliferation-associated RNAs, were found to significantly alter apoptosis and cell number when over-expressed in vitro.
This analysis identified co-expression of RNAs that encode functionally-related proteins, in particular, proliferation-associated RNA clusters that are linked to melanoma patient prognosis. Our analysis suggests that A375 cells in vitro may be valid models in which to study the gene expression modules that underlie some melanoma biological processes (e.g., proliferation) but not others (e.g., immune response). The gene expression modules identified here, and the RNAs predicted by Bayesian network inference to be upstream of these modules, are potential prognostic biomarkers and drug targets.
Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions.