Search tips
Search criteria

Results 1-25 (27)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
Document Types
1.  ADAR1 forms a complex with Dicer to promote microRNA processing and RNA-induced gene silencing 
Cell  2013;153(3):575-589.
Adenosine deaminases acting on RNA (ADARs) are involved in RNA editing that converts adenosine residues to inosine specifically in double-stranded RNAs. In this study, we investigated the interaction of the RNA editing mechanism with the RNA interference (RNAi) machinery and found that ADAR1 forms a complex with Dicer through direct protein-protein interaction. Most importantly, ADAR1 increases the maximum rate (Vmax) of pre-microRNA (miRNA) cleavage by Dicer and facilitates loading of miRNA onto RNA-induced silencing complexes, identifying a new role of ADAR1 in miRNA processing and RNAi mechanisms. ADAR1 differentiates its functions in RNA editing and RNAi by formation of either ADAR1/ADAR1 homodimer or Dicer/ADAR1 heterodimer complexes, respectively. As expected, expression of miRNAs is globally inhibited in ADAR1−/− mouse embryos, which in turn alters expression of their target genes and might contribute to their embryonic lethal phenotype.
PMCID: PMC3651894  PMID: 23622242
2.  Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes 
Nucleic Acids Research  2014;42(8):e64.
Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients’ molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification.
PMCID: PMC4005667  PMID: 24503249
3.  NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data 
BMC Bioinformatics  2013;14:262.
RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions.
We developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is “nonparametric” in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates.
NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at
PMCID: PMC3765716  PMID: 23981227
4.  Isoform level expression profiles provide better cancer signatures than gene level expression profiles 
Genome Medicine  2013;5(4):33.
The majority of mammalian genes generate multiple transcript variants and protein isoforms through alternative transcription and/or alternative splicing, and the dynamic changes at the transcript/isoform level between non-oncogenic and cancer cells remain largely unexplored. We hypothesized that isoform level expression profiles would be better than gene level expression profiles at discriminating between non-oncogenic and cancer cellsgene level.
We analyzed 160 Affymetrix exon-array datasets, comprising cell lines of non-oncogenic or oncogenic tissue origins. We obtained the transcript-level and gene level expression estimates, and used unsupervised and supervised clustering algorithms to study the profile similarity between the samples at both gene and isoform levels.
Hierarchical clustering, based on isoform level expressions, effectively grouped the non-oncogenic and oncogenic cell lines with a virtually perfect homogeneity-grouping rate (97.5%), regardless of the tissue origin of the cell lines. However, gene levelthis rate was much lower, being 75% at best based on the gene level expressions. Statistical analyses of the difference between cancer and non-oncogenic samples identified the existence of numerous genes with differentially expressed isoforms, which otherwise were not significant at the gene level. We also found that canonical pathways of protein ubiquitination, purine metabolism, and breast-cancer regulation by stathmin1 were significantly enriched among genes thatshow differential expression at isoform level but not at gene level.
In summary, cancer cell lines, regardless of their tissue of origin, can be effectively discriminated from non-cancer cell lines at isoform level, but not at gene level. This study suggests the existence of an isoform signature, rather than a gene signature, which could be used to distinguish cancer cells from normal cells.
PMCID: PMC3706752  PMID: 23594586
5.  Identification of Host-Chromosome Binding Sites and Candidate Gene Targets for Kaposi's Sarcoma-Associated Herpesvirus LANA 
Journal of Virology  2012;86(10):5752-5762.
LANA is essential for tethering the Kaposi's sarcoma-associated herpesvirus (KSHV) genome to metaphase chromosomes and for modulating host-cell gene expression, but the binding sites in the host-chromosome remain unknown. Here, we use LANA-specific chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to identify LANA binding sites in the viral and host-cell genomes of a latently infected pleural effusion lymphoma cell line BCBL1. LANA bound with high occupancy to the KSHV genome terminal repeats (TR) and to a few minor binding sites in the KSHV genome, including the LANA promoter region. We identified 256 putative LANA binding site peaks with P < 0.01 and overlap in two independent ChIP-Seq experiments. We validated several of the high-occupancy binding sites by conventional ChIP assays and quantitative PCR. Candidate cellular LANA binding motifs were identified and assayed for binding to purified recombinant LANA protein in vitro but bound with low affinity compared to the viral TR binding site. More than half of the LANA binding sites (170/256) could be mapped to within 2.5 kb of a cellular gene transcript. Pathways and Gene Ontogeny (GO) analysis revealed that LANA binds to genes within the p53 and tumor necrosis factor (TNF) regulatory network. Further analysis revealed partial overlap of LANA and STAT1 binding sites in several gamma interferon (IFN-γ)-regulated genes. We show that ectopic expression of LANA can downmodulate IFN-γ-mediated activation of a subset of genes, including the TAP1 peptide transporter and proteasome subunit beta type 9 (PSMB9), both of which are required for class I antigen presentation. Our data provide a potential mechanism through which LANA may regulate several host cell pathways by direct binding to gene regulatory elements.
PMCID: PMC3347294  PMID: 22419807
6.  Sequencing and analysis of a South Asian-Indian personal genome 
BMC Genomics  2012;13:440.
With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala.
We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance.
This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population.
PMCID: PMC3534380  PMID: 22938532
Indian genome; Personal genomics; Whole genome sequencing
7.  Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles 
PLoS ONE  2011;6(9):e24210.
Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
PMCID: PMC3166302  PMID: 21912677
8.  IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data 
BMC Bioinformatics  2011;12:305.
mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons.
We propose a novel algorithm (IsoformEx) that employs weighted non-negative least squares estimation method to estimate the expression levels of transcript isoforms. Validations based on in silico simulation of mRNA-Seq and qRT-PCR experiments with real mRNA-Seq data showed that IsoformEx could accurately estimate transcript expression levels. In comparisons with published methods, the transcript expression levels estimated by IsoformEx showed higher correlation with known transcript expression levels from simulated mRNA-Seq data, and higher agreement with qRT-PCR measurements of specific transcripts for real mRNA-Seq data.
IsoformEx is a fast and accurate algorithm to estimate transcript expression levels and gene expression levels, which takes into account short exons and alternative exons with a weighting scheme. The software is available at
PMCID: PMC3180389  PMID: 21794104
Cancer research  2010;70(7):2789-2798.
Single nucleotide polymorphisms (SNPs) associated with polygenetic disorders, such as breast cancer (BC), can create, destroy or modify microRNA (miRNA) binding sites; however, the extent to which SNPs interfere with miRNA gene regulation and affect cancer susceptibility remains largely unknown. We hypothesize that disruption of miRNA target binding by SNPsis a widespread mechanism relevant to cancer susceptibility. In order to test this, we analyzed SNPs known to be associated with BC risk, in silico and in vitro, for their ability to modify miRNA binding sites and miRNA gene regulation and referred to these as target SNPs. We identified rs1982073-TGFB1 and rs1799782-XRCC1 as target SNPs, whose alleles could modulate gene expression by differential interaction with miR-187 and miR-138, respectively. Genome-wide bioinformatics analysis predicted approximately 64% of transcribed SNPs as target SNPs that can modify (increase/decrease) the binding energy of putative miRNA::mRNA duplexes by over 90%. To assess whether target SNPs are implicated in BC susceptibility, we conducted a case-control population study and observed that germline occurrence of rs799917-BRCA1 and rs334348-TGFR1, significantly varies among populations with different risks of developing BC. Luciferase activity of target SNPs allelic variants and protein levels in cancer cell lines with different genotypes showed differential regulation of target genes following over-expression of the two interacting miRNAs (miR-638 and miR-628-5p). Therefore, we propose that transcribed target SNPs alter miRNA gene regulation and consequently protein expression, contributing to the likelihood of cancer susceptibility, by a novel mechanism of subtle gene regulation.
PMCID: PMC2853025  PMID: 20332227
Single nucleotide polymorphism; microRNAs; mRNA; BC; tumor susceptibility
10.  MPromDb update 2010: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-seq experimental data 
Nucleic Acids Research  2010;39(Database issue):D92-D97.
MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42 893 human and 48 366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences—GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at
PMCID: PMC3013732  PMID: 21097880
11.  Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1) 
Virology Journal  2010;7:262.
The Epstein-Barr Virus (EBV) Nuclear Antigen 1 (EBNA1) protein is required for the establishment of EBV latent infection in proliferating B-lymphocytes. EBNA1 is a multifunctional DNA-binding protein that stimulates DNA replication at the viral origin of plasmid replication (OriP), regulates transcription of viral and cellular genes, and tethers the viral episome to the cellular chromosome. EBNA1 also provides a survival function to B-lymphocytes, potentially through its ability to alter cellular gene expression. To better understand these various functions of EBNA1, we performed a genome-wide analysis of the viral and cellular DNA sites associated with EBNA1 protein in a latently infected Burkitt lymphoma B-cell line. Chromatin-immunoprecipitation (ChIP) combined with massively parallel deep-sequencing (ChIP-Seq) was used to identify cellular sites bound by EBNA1. Sites identified by ChIP-Seq were validated by conventional real-time PCR, and ChIP-Seq provided quantitative, high-resolution detection of the known EBNA1 binding sites on the EBV genome at OriP and Qp. We identified at least one cluster of unusually high-affinity EBNA1 binding sites on chromosome 11, between the divergent FAM55 D and FAM55B genes. A consensus for all cellular EBNA1 binding sites is distinct from those derived from the known viral binding sites, suggesting that some of these sites are indirectly bound by EBNA1. EBNA1 also bound close to the transcriptional start sites of a large number of cellular genes, including HDAC3, CDC7, and MAP3K1, which we show are positively regulated by EBNA1. EBNA1 binding sites were enriched in some repetitive elements, especially LINE 1 retrotransposons, and had weak correlations with histone modifications and ORC binding. We conclude that EBNA1 can interact with a large number of cellular genes and chromosomal loci in latently infected cells, but that these sites are likely to represent a complex ensemble of direct and indirect EBNA1 binding sites.
PMCID: PMC2964674  PMID: 20929547
12.  Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq 
Nucleic Acids Research  2010;39(1):190-201.
Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.
PMCID: PMC3017616  PMID: 20843783
13.  Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data 
BMC Bioinformatics  2010;11(Suppl 1):S65.
Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context.
We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters.
We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters.
Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters.
PMCID: PMC3009539  PMID: 20122241
14.  Aberrant Transforming Growth Factor β1 Signaling and SMAD4 Nuclear Translocation Confer Epigenetic Repression of ADAM19 in Ovarian Cancer12 
Neoplasia (New York, N.Y.)  2008;10(9):908-919.
Transforming growth factor-beta (TGF-β)/SMAD signaling is a key growth regulatory pathway often dysregulated in ovarian cancer and other malignancies. Although loss of TGF-β-mediated growth inhibition has been shown to contribute to aberrant cell behavior, the epigenetic consequence(s) of impaired TGF-β/SMAD signaling on target genes is not well established. In this study, we show that TGF-β1 causes growth inhibition of normal ovarian surface epithelial cells, induction of nuclear translocation SMAD4, and up-regulation of ADAM19 (a disintegrin and metalloprotease domain 19), a newly identified TGF-β1 target gene. Conversely, induction and nuclear translocation of SMAD4 were negligible in ovarian cancer cells refractory to TGF-β1 stimulation, and ADAM19 expression was greatly reduced. Furthermore, in the TGF-β1 refractory cells, an inactive chromatin environment, marked by repressive histone modifications (trimethyl-H3K27 and dimethyl-H3K9) and histone deacetylase, was associated with the ADAM19 promoter region. However, the CpG island found within the promoter and first exon of ADAM19 remained generally unmethylated. Although disrupted growth factor signaling has been linked to epigenetic gene silencing in cancer, this is the first evidence demonstrating that impaired TGF-β1 signaling can result in the formation of a repressive chromatin state and epigenetic suppression of ADAM19. Given the emerging role of ADAMs family proteins in growth factor regulation in normal cells, we suggest that epigenetic dysregulation of ADAM19 may contribute to the neoplastic process in ovarian cancer.
PMCID: PMC2517635  PMID: 18714391
15.  Genome-wide analysis of alternative promoters of human genes using a custom promoter tiling array 
BMC Genomics  2008;9:349.
Independent lines of evidence suggested that a large fraction of human genes possess multiple promoters driving gene expression from distinct transcription start sites. Understanding which promoter is employed in which cellular context is required to unravel gene regulatory networks within the cell.
We have developed a custom microarray platform that tiles roughly 35,000 alternative putative promoters from nearly 7,000 genes in the human genome. To demonstrate the utility of this array platform, we have analyzed the patterns of promoter usage in 17β-estradiol (E2)-treated and untreated MCF7 cells and show widespread usage of alternative promoters. Most intriguingly, we show that the downstream promoter in E2-sensitive multiple promoter genes tends to be very close to the 3'-terminus of the gene, suggesting exotic mechanisms of expression regulation in these genes.
The usage of alternative promoters greatly multiplies the transcriptional complexity available within the human genome. The fact that many of these promoters are incapable of driving the synthesis of a meaningful protein-encoding transcript further complicates the story.
PMCID: PMC2527337  PMID: 18655706
16.  An Application of a Service-oriented System to Support ArrayAnnotation in Custom Chip Design for Epigenomic Analysis 
Cancer Informatics  2008;6:111-125.
We present the implementation of an application using caGrid, which is the service-oriented Grid software infrastructure of the NCI cancer Biomedical Informatics Grid (caBIGTM), to support design and analysis of custom microarray experiments in the study of epigenetic alterations in cancer. The design and execution of these experiments requires synthesis of information from multiple data types and datasets. In our implementation, each data source is implemented as a caGrid Data Service, and analytical resources are wrapped as caGrid Analytical Services. This service-based implementation has several advantages. A backend resource can be modified or upgraded, without needing to change other components in the application. A remote resource can be added easily, since resources are not required to be collected in a centralized infrastructure.
PMCID: PMC2623307  PMID: 19259406
service oriented architectures; grid computing; caBIG and caGrid; data integration; epigenetic studies
17.  Aberrant DNA Methylation of OLIG1, a Novel Prognostic Factor in Non-Small Cell Lung Cancer 
PLoS Medicine  2007;4(3):e108.
Lung cancer is the leading cause of cancer-related death worldwide. Currently, tumor, node, metastasis (TNM) staging provides the most accurate prognostic parameter for patients with non-small cell lung cancer (NSCLC). However, the overall survival of patients with resectable tumors varies significantly, indicating the need for additional prognostic factors to better predict the outcome of the disease, particularly within a given TNM subset.
Methods and Findings
In this study, we investigated whether adenocarcinomas and squamous cell carcinomas could be differentiated based on their global aberrant DNA methylation patterns. We performed restriction landmark genomic scanning on 40 patient samples and identified 47 DNA methylation targets that together could distinguish the two lung cancer subgroups. The protein expression of one of those targets, oligodendrocyte transcription factor 1 (OLIG1), significantly correlated with survival in NSCLC patients, as shown by univariate and multivariate analyses. Furthermore, the hazard ratio for patients negative for OLIG1 protein was significantly higher than the one for those patients expressing the protein, even at low levels.
Multivariate analyses of our data confirmed that OLIG1 protein expression significantly correlates with overall survival in NSCLC patients, with a relative risk of 0.84 (95% confidence interval 0.77–0.91, p < 0.001) along with T and N stages, as indicated by a Cox proportional hazard model. Taken together, our results suggests that OLIG1 protein expression could be utilized as a novel prognostic factor, which could aid in deciding which NSCLC patients might benefit from more aggressive therapy. This is potentially of great significance, as the addition of postoperative adjuvant chemotherapy in T2N0 NSCLC patients is still controversial.
Christopher Plass and colleagues find thatOLIG1 expression correlates with survival in lung cancer patients and suggest that it could be used in deciding which patients are likely to benefit from more aggressive therapy.
Editors' Summary
Lung cancer is the commonest cause of cancer-related death worldwide. Most cases are of a type called non-small cell lung cancer (NSCLC). Like other cancers, treatment of NCSLC depends on the “TNM stage” at which the cancer is detected. Staging takes into account the size and local spread of the tumor (its T classification), whether nearby lymph nodes contain tumor cells (its N classification), and whether tumor cells have spread (metastasized) throughout the body (its M classification). Stage I tumors are confined to the lung and are removed surgically. Stage II tumors have spread to nearby lymph nodes and are treated with a combination of surgery and chemotherapy. Stage III tumors have spread throughout the chest, and stage IV tumors have metastasized around the body; patients with both of these stages are treated with chemotherapy alone. About 70% of patients with stage I or II lung cancer, but only 2% of patients with stage IV lung cancer, survive for five years after diagnosis.
Why Was This Study Done?
TNM staging is the best way to predict the likely outcome (prognosis) for patients with NSCLC, but survival times for patients with stage I and II tumors vary widely. Another prognostic marker—maybe a “molecular signature”—that could distinguish patients who are likely to respond to treatment from those whose cancer will inevitably progress would be very useful. Unlike normal cells, cancer cells divide uncontrollably and can move around the body. These behavioral changes are caused by alterations in the pattern of proteins expressed by the cells. But what causes these alterations? The answer in some cases is “epigenetic changes” or chemical modifications of genes. In cancer cells, methyl groups are aberrantly added to GC-rich gene regions. These so-called “CpG islands” lie near gene promoters (sequences that control the transcription of DNA into mRNA, the template for protein production), and their methylation stops the promoters working and silences the gene. In this study, the researchers have investigated whether aberrant methylation patterns vary between NSCLC subtypes and whether specific aberrant methylations are associated with survival and can, therefore, be used prognostically.
What Did the Researchers Do and Find?
The researchers used “restriction landmark genomic scanning” (RLGS) to catalog global aberrant DNA methylation patterns in human lung tumor samples. In RLGS, DNA is cut into fragments with a restriction enzyme (a protein that cuts at specific DNA sequences), end-labeled, and separated using two-dimensional gel electrophoresis to give a pattern of spots. Because methylation stops some restriction enzymes cutting their target sequence, normal lung tissue and lung tumor samples yield different patterns of spots. The researchers used these patterns to identify 47 DNA methylation targets (many in CpG islands) that together distinguished between adenocarcinomas and squamous cell carcinomas, two major types of NSCLCs. Next, they measured mRNA production from the genes with the greatest difference in methylation between adenocarcinomas and squamous cell carcinomas. OLIG1 (the gene that encodes a protein involved in nerve cell development) had one of the highest differences in mRNA production between these tumor types. Furthermore, three-quarters of NSCLCs had reduced or no expression of OLIG1 protein and, when the researchers analyzed the association between OLIG1 protein expression and overall survival in patients with NSCLC, reduced OLIG1 protein expression was associated with reduced survival.
What Do These Findings Mean?
These findings indicate that different types of NSCLC can be distinguished by examining their aberrant methylation patterns. This suggests that the establishment of different DNA methylation patterns might be related to the cell type from which the tumors developed. Alternatively, the different aberrant methylation patterns might reflect the different routes that these cells take to becoming tumor cells. This research identifies a potential new prognostic marker for NSCLC by showing that OLIG1 protein expression correlates with overall survival in patients with NSCLC. This correlation needs to be tested in a clinical setting to see if adding OLIG1 expression to the current prognostic parameters can lead to better treatment choices for early-stage lung cancer patients and ultimately improve these patients' overall survival.
Additional Information.
Please access these Web sites via the online version of this summary at
Patient and professional information on lung cancer, including staging (in English and Spanish), is available from the US National Cancer Institute
The MedlinePlus encyclopedia has pages on non-small cell lung cancer (in English and Spanish)
Cancerbackup provides patient information on lung cancer
CancerQuest, provided by Emory University, has information about how cancer develops (in English, Spanish, Chinese and Russian)
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence gives background information and the latest news about epigenetics (in several European languages)
PMCID: PMC1831740  PMID: 17388669
18.  Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs 
BMC Bioinformatics  2006;7:114.
The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences.
In this study, we have performed the first analysis of these core promoter elements in orthologous mouse and human promoters with experimentally-supported transcription start sites. We have identified these various elements using a combination of positional weight matrices (PWMs) and the degree of conservation of orthologous mouse and human sequences – a procedure that significantly reduces the false positive rate of motif discovery. Our analysis of 9,010 orthologous mouse-human promoter pairs revealed two combinations of three-way synergistic effects, TATA-Inr-MTE and BRE-Inr-MTE. The former has previously been putatively identified in human, but the latter represents a novel synergistic relationship.
Our results demonstrate that DNA sequence conservation can greatly improve the identification of functional core promoter elements in the human genome. The data also underscores the importance of synergistic occurrence of two or more core promoter elements. Furthermore, the sequence data and results presented here can help build better computational models for predicting the transcription start sites in the promoter regions, which remains one of the most challenging problems.
PMCID: PMC1475891  PMID: 16522199
19.  MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data 
Nucleic Acids Research  2005;34(Database issue):D98-D103.
We have developed Mammalian Promoter Database (MPromDb), a novel database that integrates gene promoters with experimentally supported annotation of transcription start sites, cis-regulatory elements, CpG islands and chromatin immunoprecipitation microarray (ChIP-chip) experimental results with intuitively designed presentation. Release 1.0 of MPromDb currently contains 36 407 promoters and first exons (19 170 from human, 15 953 from mouse and 1284 from rat), 3739 transcription factor (TF)-binding sites (2027 from human, 1181 mouse and 531 rat) and 224 TFs with links to PubMed and GenBank references. Target promoters of TFs that have been identified by ChIP-chip assay are integrated into the database. MPromDb serves as a portal for genome-wide promoter analysis of data generated by ChIP-chip experimental studies. MPromDb can be accessed from .
PMCID: PMC1347458  PMID: 16381984
20.  Identifying estrogen receptor α target genes using integrated computational genomics and chromatin immunoprecipitation microarray 
Nucleic Acids Research  2004;32(22):6627-6635.
The estrogen receptor α (ERα) regulates gene expression by either direct binding to estrogen response elements or indirect tethering to other transcription factors on promoter targets. To identify these promoter sequences, we conducted a genome-wide screening with a novel microarray technique called ChIP-on-chip. A set of 70 candidate ERα loci were identified and the corresponding promoter sequences were analyzed by statistical pattern recognition and comparative genomics approaches. We found mouse counterparts for 63 of these loci and classified 42 (67%) as direct ERα targets using classification and regression tree (CART) statistical model, which involves position weight matrix and human-mouse sequence similarity scores as model parameters. The remaining genes were considered to be indirect targets. To validate this computational prediction, we conducted an additional ChIP-on-chip assay that identified acetylated chromatin components in active ERα promoters. Of the 27 loci upregulated in an ERα-positive breast cancer cell line, 20 having mouse counterparts were correctly predicted by CART. This integrated approach, therefore, sets a paradigm in which the iterative process of model refinement and experimental verification will continue until an accurate prediction of promoter target sequences is derived.
PMCID: PMC545447  PMID: 15608294
21.  Genomic structure and alternative splicing of murine R2B receptor protein tyrosine phosphatases (PTPκ, μ, ρ and PCP-2) 
BMC Genomics  2004;5:14.
Four genes designated as PTPRK (PTPκ), PTPRL/U (PCP-2), PTPRM (PTPμ) and PTPRT (PTPρ) code for a subfamily (type R2B) of receptor protein tyrosine phosphatases (RPTPs) uniquely characterized by the presence of an N-terminal MAM domain. These transmembrane molecules have been implicated in homophilic cell adhesion. In the human, the PTPRK gene is located on chromosome 6, PTPRL/U on 1, PTPRM on 18 and PTPRT on 20. In the mouse, the four genes ptprk, ptprl, ptprm and ptprt are located in syntenic regions of chromosomes 10, 4, 17 and 2, respectively.
The genomic organization of murine R2B RPTP genes is described. The four genes varied greatly in size ranging from ~64 kb to ~1 Mb, primarily due to proportional differences in intron lengths. Although there were also minor variations in exon length, the number of exons and the phases of exon/intron junctions were highly conserved. In situ hybridization with digoxigenin-labeled cRNA probes was used to localize each of the four R2B transcripts to specific cell types within the murine central nervous system. Phylogenetic analysis of complete sequences indicated that PTPρ and PTPμ were most closely related, followed by PTPκ. The most distant family member was PCP-2. Alignment of RPTP polypeptide sequences predicted putative alternatively spliced exons. PCR experiments revealed that five of these exons were alternatively spliced, and that each of the four phosphatases incorporated them differently. The greatest variability in genomic organization and the majority of alternatively spliced exons were observed in the juxtamembrane domain, a region critical for the regulation of signal transduction.
Comparison of the four R2B RPTP genes revealed virtually identical principles of genomic organization, despite great disparities in gene size due to variations in intron length. Although subtle differences in exon length were also observed, it is likely that functional differences among these genes arise from the specific combinations of exons generated by alternative splicing.
PMCID: PMC373446  PMID: 15040814
central nervous system; dephosphorylation; alternative splicing; adhesion molecules
22.  HemoPDB: Hematopoiesis Promoter Database, an information resource of transcriptional regulation in blood cell development 
Nucleic Acids Research  2004;32(Database issue):D86-D90.
Hematopoiesis describes the process of the normal formation and development of blood cells, involving both proliferation and differentiation from stem cells. Abnormalities in this developmental program yield blood cell diseases, such as leukemia. Although, in recent years, extensive molecular research in normal hematopoietic development has characterized transcription factors and their binding sites in the target gene promoters, the information generated is highly fragmented. In order to integrate this important regulatory information with the corresponding genomic sequences, we have developed a new database called Hematopoiesis Promoter Database (HemoPDB). HemoPDB is a comprehensive resource focused on transcriptional regulation during hematopoietic development and associated aberrances that result in malignancy. HemoPDB (version 1.0) contains 246 promoter sequences and 604 experimentally known cis-regulatory elements of 187 different transcription factors, with links to published references. Orthologous promoters from different species are linked with each other and displayed in the same database record, accompanied by a visual image of the promoters and corresponding annotations of cis-regulatory elements. HemoPDB may be searched for the promoter of a specific gene, transcription factors and target genes, and genes that are expressed in a certain cell type or lineage, through a user-friendly web interface at Links to the documentation and other technical details are provided on this website.
PMCID: PMC308790  PMID: 14681365
23.  AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors 
BMC Bioinformatics  2003;4:25.
The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).
AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes.
AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from .
PMCID: PMC166152  PMID: 12820902
24.  Association of a MicroRNA/TP53 Feedback Circuitry With Pathogenesis and Outcome of B-Cell Chronic Lymphocytic Leukemia 
Chromosomal abnormalities (namely 13q, 17p, and 11q deletions) have prognostic implications and are recurrent in chronic lymphocytic leukemia (CLL), suggesting that they are involved in a common pathogenetic pathway; however, the molecular mechanism through which chromosomal abnormalities affect the pathogenesis and outcome of CLL is unknown.
To determine whether the microRNA miR-15a/miR-16-1 cluster (located at 13q), tumor protein p53 (TP53, located at 17p), and miR-34b/miR-34c cluster (located at 11q) are linked in a molecular pathway that explains the pathogenetic and prognostic implications (indolent vs aggressive form) of recurrent 13q, 17p, and 11q deletions in CLL.
Design, Setting, and Patients
CLL Research Consortium institutions provided blood samples from untreated patients (n=206) diagnosed with B-cell CLL between January 2000 and April 2008. All samples were evaluated for the occurrence of cytogenetic abnormalities as well as the expression levels of the miR-15a/miR-16-1 cluster, miR-34b/miR-34c cluster, TP53, and zeta-chain (TCR)–associated protein kinase 70kDa (ZAP70), a surrogate prognostic marker of CLL. The functional relationship between these genes was studied using in vitro gain- and loss-of-function experiments in celllines and primary samples and was validated in a separate cohort of primary CLL samples.
Main Outcome Measures
Cytogenetic abnormalities; expression levels of the miR-15a/miR-16-1 cluster, miR-34 family, TP53 gene, downstream effectors cyclindependent kinase inhibitor 1A (p21, Cip1) (CDKN1A) and B-cell CLL/lymphoma 2 binding component 3 (BBC3), and ZAP70 gene; genetic interactions detected by chromatin immunoprecipitation.
In CLLs with13qdeletions the miR-15a/miR-16-1 cluster directly targetedTP53 (mean luciferase activity for miR-15a vs scrambled control, 0.68 relative light units (RLU) [95%confidence interval {CI}, 0.63–0.73]; P=.02;meanfor miR-16 vs scrambled control, 0.62RLU[95%CI, 0.59–0.65]; P=.02) and its downstream effectors. In leukemic cell lines and primary CLL cells, TP53 stimulated the transcription of miR-15/miR-16-1 as well as miR-34b/miR-34c clusters, and the miR-34b/miR-34c cluster directly targeted theZAP70 kinase(meanluciferase activity for miR-34a vs scrambled control, 0.33RLU [95%CI, 0.30–0.36]; P=.02;meanformiR-34bvsscrambledcontrol,0.31RLU [95%CI, 0.30–0.32];P=.01; and mean for miR-34c vs scrambled control, 0.35 RLU [95% CI, 0.33–0.37]; P=.02).
A microRNA/TP53 feedback circuitry is associated with CLL pathogenesis and outcome. This mechanism provides a novel pathogenetic model for the association of 13q deletions with the indolent form of CLL that involves microRNAs, TP53, and ZAP70
PMCID: PMC3690301  PMID: 21205967
25.  Promoter hypermethylation of FBXO32, a novel TGF-β/SMAD4 target gene and tumor suppressor, is associated with poor prognosis in human ovarian cancer 
Resistance to TGF-β is frequently observed in ovarian cancer, and disrupted TGF-β/SMAD4 signaling results in aberrant expression of downstream target genes in the disease. Our previous study showed that ADAM19, a SMAD4 target gene, is down-regulated through epigenetic mechanisms in ovarian cancer with aberrant TGF-β/SMAD4 signaling. In this study, we investigated the mechanism of down-regulation of FBXO32, another SMAD4 target gene, and the clinical significance of loss of FBXO32 expression in ovarian cancer. Expression of FBXO32 was observed in normal ovarian surface epithelium but not in ovarian cancer cell lines. FBXO32 methylation was seen in ovarian cancer cell lines displaying constitutive TGF-β/SMAD4 signaling, and epigenetic drug treatment restored FBXO32 expression in ovarian cancer cell lines regardless of FBXO32 methylation status, suggesting that epigenetic regulation of this gene in ovarian cancer may be a common event. In advanced stage ovarian tumors, significant (29.3%; P<0.05) methylation frequency of FBXO32 was observed and the association between FBXO32 methylation and shorter progression-free survival was significant, as determined by both Kaplan-Meier analysis (P<0.05) and multivariate Cox regression analysis (hazard ratio 1.003, P<0.05). Re-expression of FBXO32 markedly reduced proliferation of a platinum-resistant ovarian cancer line both in vitro and in vivo, due to increased apoptosis of the cells, and resensitized ovarian cancer cells to cisplatin. In conclusion, the novel tumor suppressor FBXO32 is epigenetically silenced in ovarian cancer cell lines with disrupted TGF-β/SMAD4 signaling and FBXO32 methylation status predicts survival in patients with ovarian cancer.
PMCID: PMC2829100  PMID: 20065949
Ovarian cancer; epigenetics; TGF-β; FBXO32

Results 1-25 (27)