PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (30)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
Document Types
1.  CTCF binding site sequence differences are associated with unique regulatory and functional trends during embryonic stem cell differentiation 
Nucleic Acids Research  2013;42(2):774-789.
CTCF (CCCTC-binding factor) is a highly conserved multifunctional DNA-binding protein with thousands of binding sites genome-wide. Our previous work suggested that differences in CTCF’s binding site sequence may affect the regulation of CTCF recruitment and its function. To investigate this possibility, we characterized changes in genome-wide CTCF binding and gene expression during differentiation of mouse embryonic stem cells. After separating CTCF sites into three classes (LowOc, MedOc and HighOc) based on similarity to the consensus motif, we found that developmentally regulated CTCF binding occurs preferentially at LowOc sites, which have lower similarity to the consensus. By measuring the affinity of CTCF for selected sites, we show that sites lost during differentiation are enriched in motifs associated with weaker CTCF binding in vitro. Specifically, enrichment for T at the 18th position of the CTCF binding site is associated with regulated binding in the LowOc class and can predictably reduce CTCF affinity for binding sites. Finally, by comparing changes in CTCF binding with changes in gene expression during differentiation, we show that LowOc and HighOc sites are associated with distinct regulatory functions. Our results suggest that the regulatory control of CTCF is dependent in part on specific motifs within its binding site.
doi:10.1093/nar/gkt910
PMCID: PMC3902912  PMID: 24121688
2.  NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data 
BMC Bioinformatics  2013;14:262.
Background
RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions.
Results
We developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is “nonparametric” in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates.
Conclusions
NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at http://bioinformatics.wistar.upenn.edu/NPEBseq.
doi:10.1186/1471-2105-14-262
PMCID: PMC3765716  PMID: 23981227
3.  Global Identification of EVI1 Target Genes in Acute Myeloid Leukemia 
PLoS ONE  2013;8(6):e67134.
The ecotropic virus integration site 1 (EVI1) transcription factor is associated with human myeloid malignancy of poor prognosis and is overexpressed in 8–10% of adult AML and strikingly up to 27% of pediatric MLL-rearranged leukemias. For the first time, we report comprehensive genomewide EVI1 binding and whole transcriptome gene deregulation in leukemic cells using a combination of ChIP-Seq and RNA-Seq expression profiling. We found disruption of terminal myeloid differentiation and cell cycle regulation to be prominent in EVI-induced leukemogenesis. Specifically, we identified EVI1 directly binds to and downregulates the master myeloid differentiation gene Cebpe and several of its downstream gene targets critical for terminal myeloid differentiation. We also found EVI1 binds to and downregulates Serpinb2 as well as numerous genes involved in the Jak-Stat signaling pathway. Finally, we identified decreased expression of several ATP-dependent P2X purinoreceptors genes involved in apoptosis mechanisms. These findings provide a foundation for future study of potential therapeutic gene targets for EVI1-induced leukemia.
doi:10.1371/journal.pone.0067134
PMCID: PMC3694976  PMID: 23826213
4.  Association of a MicroRNA/TP53 Feedback Circuitry With Pathogenesis and Outcome of B-Cell Chronic Lymphocytic Leukemia 
Context
Chromosomal abnormalities (namely 13q, 17p, and 11q deletions) have prognostic implications and are recurrent in chronic lymphocytic leukemia (CLL), suggesting that they are involved in a common pathogenetic pathway; however, the molecular mechanism through which chromosomal abnormalities affect the pathogenesis and outcome of CLL is unknown.
Objective
To determine whether the microRNA miR-15a/miR-16-1 cluster (located at 13q), tumor protein p53 (TP53, located at 17p), and miR-34b/miR-34c cluster (located at 11q) are linked in a molecular pathway that explains the pathogenetic and prognostic implications (indolent vs aggressive form) of recurrent 13q, 17p, and 11q deletions in CLL.
Design, Setting, and Patients
CLL Research Consortium institutions provided blood samples from untreated patients (n=206) diagnosed with B-cell CLL between January 2000 and April 2008. All samples were evaluated for the occurrence of cytogenetic abnormalities as well as the expression levels of the miR-15a/miR-16-1 cluster, miR-34b/miR-34c cluster, TP53, and zeta-chain (TCR)–associated protein kinase 70kDa (ZAP70), a surrogate prognostic marker of CLL. The functional relationship between these genes was studied using in vitro gain- and loss-of-function experiments in celllines and primary samples and was validated in a separate cohort of primary CLL samples.
Main Outcome Measures
Cytogenetic abnormalities; expression levels of the miR-15a/miR-16-1 cluster, miR-34 family, TP53 gene, downstream effectors cyclindependent kinase inhibitor 1A (p21, Cip1) (CDKN1A) and B-cell CLL/lymphoma 2 binding component 3 (BBC3), and ZAP70 gene; genetic interactions detected by chromatin immunoprecipitation.
Results
In CLLs with13qdeletions the miR-15a/miR-16-1 cluster directly targetedTP53 (mean luciferase activity for miR-15a vs scrambled control, 0.68 relative light units (RLU) [95%confidence interval {CI}, 0.63–0.73]; P=.02;meanfor miR-16 vs scrambled control, 0.62RLU[95%CI, 0.59–0.65]; P=.02) and its downstream effectors. In leukemic cell lines and primary CLL cells, TP53 stimulated the transcription of miR-15/miR-16-1 as well as miR-34b/miR-34c clusters, and the miR-34b/miR-34c cluster directly targeted theZAP70 kinase(meanluciferase activity for miR-34a vs scrambled control, 0.33RLU [95%CI, 0.30–0.36]; P=.02;meanformiR-34bvsscrambledcontrol,0.31RLU [95%CI, 0.30–0.32];P=.01; and mean for miR-34c vs scrambled control, 0.35 RLU [95% CI, 0.33–0.37]; P=.02).
Conclusions
A microRNA/TP53 feedback circuitry is associated with CLL pathogenesis and outcome. This mechanism provides a novel pathogenetic model for the association of 13q deletions with the indolent form of CLL that involves microRNAs, TP53, and ZAP70
doi:10.1001/jama.2010.1919
PMCID: PMC3690301  PMID: 21205967
5.  Isoform level expression profiles provide better cancer signatures than gene level expression profiles 
Genome Medicine  2013;5(4):33.
Background
The majority of mammalian genes generate multiple transcript variants and protein isoforms through alternative transcription and/or alternative splicing, and the dynamic changes at the transcript/isoform level between non-oncogenic and cancer cells remain largely unexplored. We hypothesized that isoform level expression profiles would be better than gene level expression profiles at discriminating between non-oncogenic and cancer cellsgene level.
Methods
We analyzed 160 Affymetrix exon-array datasets, comprising cell lines of non-oncogenic or oncogenic tissue origins. We obtained the transcript-level and gene level expression estimates, and used unsupervised and supervised clustering algorithms to study the profile similarity between the samples at both gene and isoform levels.
Results
Hierarchical clustering, based on isoform level expressions, effectively grouped the non-oncogenic and oncogenic cell lines with a virtually perfect homogeneity-grouping rate (97.5%), regardless of the tissue origin of the cell lines. However, gene levelthis rate was much lower, being 75% at best based on the gene level expressions. Statistical analyses of the difference between cancer and non-oncogenic samples identified the existence of numerous genes with differentially expressed isoforms, which otherwise were not significant at the gene level. We also found that canonical pathways of protein ubiquitination, purine metabolism, and breast-cancer regulation by stathmin1 were significantly enriched among genes thatshow differential expression at isoform level but not at gene level.
Conclusions
In summary, cancer cell lines, regardless of their tissue of origin, can be effectively discriminated from non-cancer cell lines at isoform level, but not at gene level. This study suggests the existence of an isoform signature, rather than a gene signature, which could be used to distinguish cancer cells from normal cells.
doi:10.1186/gm437
PMCID: PMC3706752  PMID: 23594586
6.  Identification of Host-Chromosome Binding Sites and Candidate Gene Targets for Kaposi's Sarcoma-Associated Herpesvirus LANA 
Journal of Virology  2012;86(10):5752-5762.
LANA is essential for tethering the Kaposi's sarcoma-associated herpesvirus (KSHV) genome to metaphase chromosomes and for modulating host-cell gene expression, but the binding sites in the host-chromosome remain unknown. Here, we use LANA-specific chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to identify LANA binding sites in the viral and host-cell genomes of a latently infected pleural effusion lymphoma cell line BCBL1. LANA bound with high occupancy to the KSHV genome terminal repeats (TR) and to a few minor binding sites in the KSHV genome, including the LANA promoter region. We identified 256 putative LANA binding site peaks with P < 0.01 and overlap in two independent ChIP-Seq experiments. We validated several of the high-occupancy binding sites by conventional ChIP assays and quantitative PCR. Candidate cellular LANA binding motifs were identified and assayed for binding to purified recombinant LANA protein in vitro but bound with low affinity compared to the viral TR binding site. More than half of the LANA binding sites (170/256) could be mapped to within 2.5 kb of a cellular gene transcript. Pathways and Gene Ontogeny (GO) analysis revealed that LANA binds to genes within the p53 and tumor necrosis factor (TNF) regulatory network. Further analysis revealed partial overlap of LANA and STAT1 binding sites in several gamma interferon (IFN-γ)-regulated genes. We show that ectopic expression of LANA can downmodulate IFN-γ-mediated activation of a subset of genes, including the TAP1 peptide transporter and proteasome subunit beta type 9 (PSMB9), both of which are required for class I antigen presentation. Our data provide a potential mechanism through which LANA may regulate several host cell pathways by direct binding to gene regulatory elements.
doi:10.1128/JVI.07216-11
PMCID: PMC3347294  PMID: 22419807
7.  Sequencing and analysis of a South Asian-Indian personal genome 
BMC Genomics  2012;13:440.
Background
With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala.
Results
We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance.
Conclusions
This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population.
doi:10.1186/1471-2164-13-440
PMCID: PMC3534380  PMID: 22938532
Indian genome; Personal genomics; Whole genome sequencing
8.  Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles 
PLoS ONE  2011;6(9):e24210.
Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
doi:10.1371/journal.pone.0024210
PMCID: PMC3166302  PMID: 21912677
9.  IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data 
BMC Bioinformatics  2011;12:305.
Background
mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons.
Results
We propose a novel algorithm (IsoformEx) that employs weighted non-negative least squares estimation method to estimate the expression levels of transcript isoforms. Validations based on in silico simulation of mRNA-Seq and qRT-PCR experiments with real mRNA-Seq data showed that IsoformEx could accurately estimate transcript expression levels. In comparisons with published methods, the transcript expression levels estimated by IsoformEx showed higher correlation with known transcript expression levels from simulated mRNA-Seq data, and higher agreement with qRT-PCR measurements of specific transcripts for real mRNA-Seq data.
Conclusions
IsoformEx is a fast and accurate algorithm to estimate transcript expression levels and gene expression levels, which takes into account short exons and alternative exons with a weighting scheme. The software is available at http://bioinformatics.wistar.upenn.edu/isoformex.
doi:10.1186/1471-2105-12-305
PMCID: PMC3180389  PMID: 21794104
10.  SINGLE NUCLEOTIDE POLYMORPHISMS INSIDE microRNA TARGET SITES INFLUENCE TUMOR SUSCEPTIBILITY 
Cancer research  2010;70(7):2789-2798.
Single nucleotide polymorphisms (SNPs) associated with polygenetic disorders, such as breast cancer (BC), can create, destroy or modify microRNA (miRNA) binding sites; however, the extent to which SNPs interfere with miRNA gene regulation and affect cancer susceptibility remains largely unknown. We hypothesize that disruption of miRNA target binding by SNPsis a widespread mechanism relevant to cancer susceptibility. In order to test this, we analyzed SNPs known to be associated with BC risk, in silico and in vitro, for their ability to modify miRNA binding sites and miRNA gene regulation and referred to these as target SNPs. We identified rs1982073-TGFB1 and rs1799782-XRCC1 as target SNPs, whose alleles could modulate gene expression by differential interaction with miR-187 and miR-138, respectively. Genome-wide bioinformatics analysis predicted approximately 64% of transcribed SNPs as target SNPs that can modify (increase/decrease) the binding energy of putative miRNA::mRNA duplexes by over 90%. To assess whether target SNPs are implicated in BC susceptibility, we conducted a case-control population study and observed that germline occurrence of rs799917-BRCA1 and rs334348-TGFR1, significantly varies among populations with different risks of developing BC. Luciferase activity of target SNPs allelic variants and protein levels in cancer cell lines with different genotypes showed differential regulation of target genes following over-expression of the two interacting miRNAs (miR-638 and miR-628-5p). Therefore, we propose that transcribed target SNPs alter miRNA gene regulation and consequently protein expression, contributing to the likelihood of cancer susceptibility, by a novel mechanism of subtle gene regulation.
doi:10.1158/0008-5472.CAN-09-3541
PMCID: PMC2853025  PMID: 20332227
Single nucleotide polymorphism; microRNAs; mRNA; BC; tumor susceptibility
11.  MPromDb update 2010: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-seq experimental data 
Nucleic Acids Research  2010;39(Database issue):D92-D97.
MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42 893 human and 48 366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences—GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at http://bioinformatics.wistar.upenn.edu/MPromDb/.
doi:10.1093/nar/gkq1171
PMCID: PMC3013732  PMID: 21097880
12.  The H3K27me3 Demethylase dUTX Is a Suppressor of Notch- and Rb-Dependent Tumors in Drosophila▿  
Molecular and Cellular Biology  2010;30(10):2485-2497.
Trimethylated lysine 27 of histone H3 (H3K27me3) is an epigenetic mark for gene silencing and can be demethylated by the JmjC domain of UTX. Excessive H3K27me3 levels can cause tumorigenesis, but little is known about the mechanisms leading to those cancers. Mutants of the Drosophila H3K27me3 demethylase dUTX display some characteristics of Trithorax group mutants and have increased H3K27me3 levels in vivo. Surprisingly, dUTX mutations also affect H3K4me1 levels in a JmjC-independent manner. We show that a disruption of the JmjC domain of dUTX results in a growth advantage for mutant cells over adjacent wild-type tissue due to increased proliferation. The growth advantage of dUTX mutant tissue is caused, at least in part, by increased Notch activity, demonstrating that dUTX is a Notch antagonist. Furthermore, the inactivation of Retinoblastoma (Rbf in Drosophila) contributes to the growth advantage of dUTX mutant tissue. The excessive activation of Notch in dUTX mutant cells leads to tumor-like growth in an Rbf-dependent manner. In summary, these data suggest that dUTX is a suppressor of Notch- and Rbf-dependent tumors in Drosophila melanogaster and may provide a model for UTX-dependent tumorigenesis in humans.
doi:10.1128/MCB.01633-09
PMCID: PMC2863695  PMID: 20212086
13.  Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1) 
Virology Journal  2010;7:262.
The Epstein-Barr Virus (EBV) Nuclear Antigen 1 (EBNA1) protein is required for the establishment of EBV latent infection in proliferating B-lymphocytes. EBNA1 is a multifunctional DNA-binding protein that stimulates DNA replication at the viral origin of plasmid replication (OriP), regulates transcription of viral and cellular genes, and tethers the viral episome to the cellular chromosome. EBNA1 also provides a survival function to B-lymphocytes, potentially through its ability to alter cellular gene expression. To better understand these various functions of EBNA1, we performed a genome-wide analysis of the viral and cellular DNA sites associated with EBNA1 protein in a latently infected Burkitt lymphoma B-cell line. Chromatin-immunoprecipitation (ChIP) combined with massively parallel deep-sequencing (ChIP-Seq) was used to identify cellular sites bound by EBNA1. Sites identified by ChIP-Seq were validated by conventional real-time PCR, and ChIP-Seq provided quantitative, high-resolution detection of the known EBNA1 binding sites on the EBV genome at OriP and Qp. We identified at least one cluster of unusually high-affinity EBNA1 binding sites on chromosome 11, between the divergent FAM55 D and FAM55B genes. A consensus for all cellular EBNA1 binding sites is distinct from those derived from the known viral binding sites, suggesting that some of these sites are indirectly bound by EBNA1. EBNA1 also bound close to the transcriptional start sites of a large number of cellular genes, including HDAC3, CDC7, and MAP3K1, which we show are positively regulated by EBNA1. EBNA1 binding sites were enriched in some repetitive elements, especially LINE 1 retrotransposons, and had weak correlations with histone modifications and ORC binding. We conclude that EBNA1 can interact with a large number of cellular genes and chromosomal loci in latently infected cells, but that these sites are likely to represent a complex ensemble of direct and indirect EBNA1 binding sites.
doi:10.1186/1743-422X-7-262
PMCID: PMC2964674  PMID: 20929547
14.  Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq 
Nucleic Acids Research  2010;39(1):190-201.
Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.
doi:10.1093/nar/gkq775
PMCID: PMC3017616  PMID: 20843783
15.  Promoter hypermethylation of FBXO32, a novel TGF-β/SMAD4 target gene and tumor suppressor, is associated with poor prognosis in human ovarian cancer 
Resistance to TGF-β is frequently observed in ovarian cancer, and disrupted TGF-β/SMAD4 signaling results in aberrant expression of downstream target genes in the disease. Our previous study showed that ADAM19, a SMAD4 target gene, is down-regulated through epigenetic mechanisms in ovarian cancer with aberrant TGF-β/SMAD4 signaling. In this study, we investigated the mechanism of down-regulation of FBXO32, another SMAD4 target gene, and the clinical significance of loss of FBXO32 expression in ovarian cancer. Expression of FBXO32 was observed in normal ovarian surface epithelium but not in ovarian cancer cell lines. FBXO32 methylation was seen in ovarian cancer cell lines displaying constitutive TGF-β/SMAD4 signaling, and epigenetic drug treatment restored FBXO32 expression in ovarian cancer cell lines regardless of FBXO32 methylation status, suggesting that epigenetic regulation of this gene in ovarian cancer may be a common event. In advanced stage ovarian tumors, significant (29.3%; P<0.05) methylation frequency of FBXO32 was observed and the association between FBXO32 methylation and shorter progression-free survival was significant, as determined by both Kaplan-Meier analysis (P<0.05) and multivariate Cox regression analysis (hazard ratio 1.003, P<0.05). Re-expression of FBXO32 markedly reduced proliferation of a platinum-resistant ovarian cancer line both in vitro and in vivo, due to increased apoptosis of the cells, and resensitized ovarian cancer cells to cisplatin. In conclusion, the novel tumor suppressor FBXO32 is epigenetically silenced in ovarian cancer cell lines with disrupted TGF-β/SMAD4 signaling and FBXO32 methylation status predicts survival in patients with ovarian cancer.
doi:10.1038/labinvest.2009.138
PMCID: PMC2829100  PMID: 20065949
Ovarian cancer; epigenetics; TGF-β; FBXO32
16.  Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data 
BMC Bioinformatics  2010;11(Suppl 1):S65.
Background
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
Methods
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
Results
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Conclusion
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
doi:10.1186/1471-2105-11-S1-S65
PMCID: PMC3009539  PMID: 20122241
17.  An integrative ChIP-chip and gene expression profiling to model SMAD regulatory modules 
BMC Systems Biology  2009;3:73.
Background
The TGF-β/SMAD pathway is part of a broader signaling network in which crosstalk between pathways occurs. While the molecular mechanisms of TGF-β/SMAD signaling pathway have been studied in detail, the global networks downstream of SMAD remain largely unknown. The regulatory effect of SMAD complex likely depends on transcriptional modules, in which the SMAD binding elements and partner transcription factor binding sites (SMAD modules) are present in specific context.
Results
To address this question and develop a computational model for SMAD modules, we simultaneously performed chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) and mRNA expression profiling to identify TGF-β/SMAD regulated and synchronously coexpressed gene sets in ovarian surface epithelium. Intersecting the ChIP-chip and gene expression data yielded 150 direct targets, of which 141 were grouped into 3 co-expressed gene sets (sustained up-regulated, transient up-regulated and down-regulated), based on their temporal changes in expression after TGF-β activation. We developed a data-mining method driven by the Random Forest algorithm to model SMAD transcriptional modules in the target sequences. The predicted SMAD modules contain SMAD binding element and up to 2 of 7 other transcription factor binding sites (E2F, P53, LEF1, ELK1, COUPTF, PAX4 and DR1).
Conclusion
Together, the computational results further the understanding of the interactions between SMAD and other transcription factors at specific target promoters, and provide the basis for more targeted experimental verification of the co-regulatory modules.
doi:10.1186/1752-0509-3-73
PMCID: PMC2724489  PMID: 19615063
18.  Aberrant Transforming Growth Factor β1 Signaling and SMAD4 Nuclear Translocation Confer Epigenetic Repression of ADAM19 in Ovarian Cancer12 
Neoplasia (New York, N.Y.)  2008;10(9):908-919.
Transforming growth factor-beta (TGF-β)/SMAD signaling is a key growth regulatory pathway often dysregulated in ovarian cancer and other malignancies. Although loss of TGF-β-mediated growth inhibition has been shown to contribute to aberrant cell behavior, the epigenetic consequence(s) of impaired TGF-β/SMAD signaling on target genes is not well established. In this study, we show that TGF-β1 causes growth inhibition of normal ovarian surface epithelial cells, induction of nuclear translocation SMAD4, and up-regulation of ADAM19 (a disintegrin and metalloprotease domain 19), a newly identified TGF-β1 target gene. Conversely, induction and nuclear translocation of SMAD4 were negligible in ovarian cancer cells refractory to TGF-β1 stimulation, and ADAM19 expression was greatly reduced. Furthermore, in the TGF-β1 refractory cells, an inactive chromatin environment, marked by repressive histone modifications (trimethyl-H3K27 and dimethyl-H3K9) and histone deacetylase, was associated with the ADAM19 promoter region. However, the CpG island found within the promoter and first exon of ADAM19 remained generally unmethylated. Although disrupted growth factor signaling has been linked to epigenetic gene silencing in cancer, this is the first evidence demonstrating that impaired TGF-β1 signaling can result in the formation of a repressive chromatin state and epigenetic suppression of ADAM19. Given the emerging role of ADAMs family proteins in growth factor regulation in normal cells, we suggest that epigenetic dysregulation of ADAM19 may contribute to the neoplastic process in ovarian cancer.
PMCID: PMC2517635  PMID: 18714391
19.  Genome-wide analysis of alternative promoters of human genes using a custom promoter tiling array 
BMC Genomics  2008;9:349.
Background
Independent lines of evidence suggested that a large fraction of human genes possess multiple promoters driving gene expression from distinct transcription start sites. Understanding which promoter is employed in which cellular context is required to unravel gene regulatory networks within the cell.
Results
We have developed a custom microarray platform that tiles roughly 35,000 alternative putative promoters from nearly 7,000 genes in the human genome. To demonstrate the utility of this array platform, we have analyzed the patterns of promoter usage in 17β-estradiol (E2)-treated and untreated MCF7 cells and show widespread usage of alternative promoters. Most intriguingly, we show that the downstream promoter in E2-sensitive multiple promoter genes tends to be very close to the 3'-terminus of the gene, suggesting exotic mechanisms of expression regulation in these genes.
Conclusion
The usage of alternative promoters greatly multiplies the transcriptional complexity available within the human genome. The fact that many of these promoters are incapable of driving the synthesis of a meaningful protein-encoding transcript further complicates the story.
doi:10.1186/1471-2164-9-349
PMCID: PMC2527337  PMID: 18655706
20.  An Application of a Service-oriented System to Support ArrayAnnotation in Custom Chip Design for Epigenomic Analysis 
Cancer Informatics  2008;6:111-125.
We present the implementation of an application using caGrid, which is the service-oriented Grid software infrastructure of the NCI cancer Biomedical Informatics Grid (caBIGTM), to support design and analysis of custom microarray experiments in the study of epigenetic alterations in cancer. The design and execution of these experiments requires synthesis of information from multiple data types and datasets. In our implementation, each data source is implemented as a caGrid Data Service, and analytical resources are wrapped as caGrid Analytical Services. This service-based implementation has several advantages. A backend resource can be modified or upgraded, without needing to change other components in the application. A remote resource can be added easily, since resources are not required to be collected in a centralized infrastructure.
PMCID: PMC2623307  PMID: 19259406
service oriented architectures; grid computing; caBIG and caGrid; data integration; epigenetic studies
21.  NF-κB Regulation of YY1 Inhibits Skeletal Myogenesis through Transcriptional Silencing of Myofibrillar Genes▿ †  
Molecular and Cellular Biology  2007;27(12):4374-4387.
NF-κB signaling is implicated as an important regulator of skeletal muscle homeostasis, but the mechanisms by which this transcription factor contributes to muscle maturation and turnover remain unclear. To gain insight into these mechanisms, gene expression profiling was examined in C2C12 myoblasts devoid of NF-κB activity. Interestingly, even in proliferating myoblasts, the absence of NF-κB caused the pronounced induction of several myofibrillar genes, suggesting that NF-κB functions as a negative regulator of late-stage muscle differentiation. Although several myofibrillar promoters contain predicted NF-κB binding sites, functional analysis using the troponin-I2 gene as a model revealed that NF-κB-mediated repression does not occur through direct DNA binding. In the search for an indirect mediator, the transcriptional repressor YinYang1 (YY1) was identified. While inducers of NF-κB stimulated YY1 expression in multiple cell types, genetic ablation of the RelA/p65 subunit of NF-κB in both cultured cells and adult skeletal muscle correlated with reduced YY1 transcripts and protein. NF-κB regulation of YY1 occurred at the transcriptional level, mediated by direct binding of the p50/p65 heterodimer complex to the YY1 promoter. Furthermore, YY1 was found associated with multiple myofibrillar promoters in C2C12 myoblasts containing NF-κB activity. Based on these results, we propose that NF-κB regulation of YY1 and transcriptional silencing of myofibrillar genes represent a new mechanism by which NF-κB functions in myoblasts to modulate skeletal muscle differentiation.
doi:10.1128/MCB.02020-06
PMCID: PMC1900043  PMID: 17438126
22.  CpG Island Methylation in a Mouse Model of Lymphoma Is Driven by the Genetic Configuration of Tumor Cells 
PLoS Genetics  2007;3(9):e167.
Hypermethylation of CpG islands is a common epigenetic alteration associated with cancer. Global patterns of hypermethylation are tumor-type specific and nonrandom. The biological significance and the underlying mechanisms of tumor-specific aberrant promoter methylation remain unclear, but some evidence suggests that this specificity involves differential sequence susceptibilities, the targeting of DNA methylation activity to specific promoter sequences, or the selection of rare DNA methylation events during disease progression. Using restriction landmark genomic scanning on samples derived from tissue culture and in vivo models of T cell lymphomas, we found that MYC overexpression gave rise to a specific signature of CpG island hypermethylation. This signature reflected gene transcription profiles and was detected only in advanced stages of disease. The further inactivation of the Pten, p53, and E2f2 tumor suppressors in MYC-induced lymphomas resulted in distinct and diagnostic CpG island methylation signatures. Our data suggest that tumor-specific DNA methylation in lymphomas arises as a result of the selection of rare DNA methylation events during the course of tumor development. This selection appears to be driven by the genetic configuration of tumor cells, providing experimental evidence for a causal role of DNA hypermethylation in tumor progression and an explanation for the tremendous epigenetic heterogeneity observed in the evolution of human cancers. The ability to predict genome-wide epigenetic silencing based on relatively few genetic alterations will allow for a more complete classification of tumors and understanding of tumor cell biology.
Author Summary
Genetic and epigenetic alterations of the genome are common features of cancers. The relationship between these two types of alterations, however, remains unclear. One type of epigenetic modification—DNA methylation in promoter sequences of genes—is of particular interest, since tumor cells have different patterns of promoter methylation than normal cells. Previous studies on human tumor samples have suggested a link between genetic alterations and the induction of aberrant DNA methylation; however, this link has been difficult to rigorously assess because of the incredible genetic heterogeneity found in human cancer. In this study, a mouse model of T cell lymphoma was used to explore the relationship between genetic and epigenetic modifications experienced by tumor cells. By introducing defined genetic changes into preneoplastic T cells of mice, such as the overexpression of the MYC oncogene and the ablation of tumor suppressor genes, we could carefully evaluate how these genetic changes impacted promoter methylation profiles during development of lymphomas in vivo. We found that the introduction of different genetic insults resulted in unique and diagnostic profiles of promoter methylation. Understanding how these methylation signatures contribute to tumor progression could eventually have diagnostic, prognostic, and therapeutic value for human cancers.
doi:10.1371/journal.pgen.0030167
PMCID: PMC1994712  PMID: 17907813
23.  A MicroRNA Signature of Hypoxia† ▿  
Molecular and Cellular Biology  2007;27(5):1859-1867.
Recent research has identified critical roles for microRNAs in a large number of cellular processes, including tumorigenic transformation. While significant progress has been made towards understanding the mechanisms of gene regulation by microRNAs, much less is known about factors affecting the expression of these noncoding transcripts. Here, we demonstrate for the first time a functional link between hypoxia, a well-documented tumor microenvironment factor, and microRNA expression. Microarray-based expression profiles revealed that a specific spectrum of microRNAs (including miR-23, -24, -26, -27, -103, -107, -181, -210, and -213) is induced in response to low oxygen, at least some via a hypoxia-inducible-factor-dependent mechanism. Select members of this group (miR-26, -107, and -210) decrease proapoptotic signaling in a hypoxic environment, suggesting an impact of these transcripts on tumor formation. Interestingly, the vast majority of hypoxia-induced microRNAs are also overexpressed in a variety of human tumors.
doi:10.1128/MCB.01395-06
PMCID: PMC1820461  PMID: 17194750
24.  Aberrant DNA Methylation of OLIG1, a Novel Prognostic Factor in Non-Small Cell Lung Cancer 
PLoS Medicine  2007;4(3):e108.
Background
Lung cancer is the leading cause of cancer-related death worldwide. Currently, tumor, node, metastasis (TNM) staging provides the most accurate prognostic parameter for patients with non-small cell lung cancer (NSCLC). However, the overall survival of patients with resectable tumors varies significantly, indicating the need for additional prognostic factors to better predict the outcome of the disease, particularly within a given TNM subset.
Methods and Findings
In this study, we investigated whether adenocarcinomas and squamous cell carcinomas could be differentiated based on their global aberrant DNA methylation patterns. We performed restriction landmark genomic scanning on 40 patient samples and identified 47 DNA methylation targets that together could distinguish the two lung cancer subgroups. The protein expression of one of those targets, oligodendrocyte transcription factor 1 (OLIG1), significantly correlated with survival in NSCLC patients, as shown by univariate and multivariate analyses. Furthermore, the hazard ratio for patients negative for OLIG1 protein was significantly higher than the one for those patients expressing the protein, even at low levels.
Conclusions
Multivariate analyses of our data confirmed that OLIG1 protein expression significantly correlates with overall survival in NSCLC patients, with a relative risk of 0.84 (95% confidence interval 0.77–0.91, p < 0.001) along with T and N stages, as indicated by a Cox proportional hazard model. Taken together, our results suggests that OLIG1 protein expression could be utilized as a novel prognostic factor, which could aid in deciding which NSCLC patients might benefit from more aggressive therapy. This is potentially of great significance, as the addition of postoperative adjuvant chemotherapy in T2N0 NSCLC patients is still controversial.
Christopher Plass and colleagues find thatOLIG1 expression correlates with survival in lung cancer patients and suggest that it could be used in deciding which patients are likely to benefit from more aggressive therapy.
Editors' Summary
Background.
Lung cancer is the commonest cause of cancer-related death worldwide. Most cases are of a type called non-small cell lung cancer (NSCLC). Like other cancers, treatment of NCSLC depends on the “TNM stage” at which the cancer is detected. Staging takes into account the size and local spread of the tumor (its T classification), whether nearby lymph nodes contain tumor cells (its N classification), and whether tumor cells have spread (metastasized) throughout the body (its M classification). Stage I tumors are confined to the lung and are removed surgically. Stage II tumors have spread to nearby lymph nodes and are treated with a combination of surgery and chemotherapy. Stage III tumors have spread throughout the chest, and stage IV tumors have metastasized around the body; patients with both of these stages are treated with chemotherapy alone. About 70% of patients with stage I or II lung cancer, but only 2% of patients with stage IV lung cancer, survive for five years after diagnosis.
Why Was This Study Done?
TNM staging is the best way to predict the likely outcome (prognosis) for patients with NSCLC, but survival times for patients with stage I and II tumors vary widely. Another prognostic marker—maybe a “molecular signature”—that could distinguish patients who are likely to respond to treatment from those whose cancer will inevitably progress would be very useful. Unlike normal cells, cancer cells divide uncontrollably and can move around the body. These behavioral changes are caused by alterations in the pattern of proteins expressed by the cells. But what causes these alterations? The answer in some cases is “epigenetic changes” or chemical modifications of genes. In cancer cells, methyl groups are aberrantly added to GC-rich gene regions. These so-called “CpG islands” lie near gene promoters (sequences that control the transcription of DNA into mRNA, the template for protein production), and their methylation stops the promoters working and silences the gene. In this study, the researchers have investigated whether aberrant methylation patterns vary between NSCLC subtypes and whether specific aberrant methylations are associated with survival and can, therefore, be used prognostically.
What Did the Researchers Do and Find?
The researchers used “restriction landmark genomic scanning” (RLGS) to catalog global aberrant DNA methylation patterns in human lung tumor samples. In RLGS, DNA is cut into fragments with a restriction enzyme (a protein that cuts at specific DNA sequences), end-labeled, and separated using two-dimensional gel electrophoresis to give a pattern of spots. Because methylation stops some restriction enzymes cutting their target sequence, normal lung tissue and lung tumor samples yield different patterns of spots. The researchers used these patterns to identify 47 DNA methylation targets (many in CpG islands) that together distinguished between adenocarcinomas and squamous cell carcinomas, two major types of NSCLCs. Next, they measured mRNA production from the genes with the greatest difference in methylation between adenocarcinomas and squamous cell carcinomas. OLIG1 (the gene that encodes a protein involved in nerve cell development) had one of the highest differences in mRNA production between these tumor types. Furthermore, three-quarters of NSCLCs had reduced or no expression of OLIG1 protein and, when the researchers analyzed the association between OLIG1 protein expression and overall survival in patients with NSCLC, reduced OLIG1 protein expression was associated with reduced survival.
What Do These Findings Mean?
These findings indicate that different types of NSCLC can be distinguished by examining their aberrant methylation patterns. This suggests that the establishment of different DNA methylation patterns might be related to the cell type from which the tumors developed. Alternatively, the different aberrant methylation patterns might reflect the different routes that these cells take to becoming tumor cells. This research identifies a potential new prognostic marker for NSCLC by showing that OLIG1 protein expression correlates with overall survival in patients with NSCLC. This correlation needs to be tested in a clinical setting to see if adding OLIG1 expression to the current prognostic parameters can lead to better treatment choices for early-stage lung cancer patients and ultimately improve these patients' overall survival.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040108.
Patient and professional information on lung cancer, including staging (in English and Spanish), is available from the US National Cancer Institute
The MedlinePlus encyclopedia has pages on non-small cell lung cancer (in English and Spanish)
Cancerbackup provides patient information on lung cancer
CancerQuest, provided by Emory University, has information about how cancer develops (in English, Spanish, Chinese and Russian)
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence gives background information and the latest news about epigenetics (in several European languages)
doi:10.1371/journal.pmed.0040108
PMCID: PMC1831740  PMID: 17388669
25.  Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs 
BMC Bioinformatics  2006;7:114.
Background
The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences.
Results
In this study, we have performed the first analysis of these core promoter elements in orthologous mouse and human promoters with experimentally-supported transcription start sites. We have identified these various elements using a combination of positional weight matrices (PWMs) and the degree of conservation of orthologous mouse and human sequences – a procedure that significantly reduces the false positive rate of motif discovery. Our analysis of 9,010 orthologous mouse-human promoter pairs revealed two combinations of three-way synergistic effects, TATA-Inr-MTE and BRE-Inr-MTE. The former has previously been putatively identified in human, but the latter represents a novel synergistic relationship.
Conclusion
Our results demonstrate that DNA sequence conservation can greatly improve the identification of functional core promoter elements in the human genome. The data also underscores the importance of synergistic occurrence of two or more core promoter elements. Furthermore, the sequence data and results presented here can help build better computational models for predicting the transcription start sites in the promoter regions, which remains one of the most challenging problems.
doi:10.1186/1471-2105-7-114
PMCID: PMC1475891  PMID: 16522199

Results 1-25 (30)