Molecular risk stratification of acute myeloid leukemia (AML) is largely based on genetic markers. However, epigenetic changes, including DNA methylation, deregulate gene expression and may also have prognostic impact. We evaluated the clinical relevance of integrating DNA methylation and genetic information in AML.
Next-generation sequencing analysis of methylated DNA identified differentially methylated regions (DMRs) associated with prognostic mutations in older (≥ 60 years) cytogenetically normal (CN) patients with AML (n = 134). Genes with promoter DMRs and expression levels significantly associated with outcome were used to compute a prognostic gene expression weighted summary score that was tested and validated in four independent patient sets (n = 355).
In the training set, we identified seven genes (CD34, RHOC, SCRN1, F2RL1, FAM92A1, MIR155HG, and VWA8) with promoter DMRs and expression associated with overall survival (OS; P ≤ .001). Each gene had high DMR methylation and lower expression, which were associated with better outcome. A weighted summary expression score of the seven gene expression levels was computed. A low score was associated with a higher complete remission (CR) rate and longer disease-free survival and OS (P < .001 for all end points). This was validated in multivariable models and in two younger (< 60 years) and two older independent sets of patients with CN-AML. Considering the seven genes individually, the fewer the genes with high expression, the better the outcome. Younger and older patients with no genes or one gene with high expression had the best outcomes (CR rate, 94% and 87%, respectively; 3-year OS, 80% and 42%, respectively).
A seven-gene score encompassing epigenetic and genetic prognostic information identifies novel AML subsets that are meaningful for treatment guidance.
DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples.
STR; forensic; next-generation sequencing; high-throughput sequencing; Illumina; Bridge PCR; SNP; genotyping
The dysregulation of transforming growth factor-β (TGF-β) signaling plays a crucial role in ovarian carcinogenesis and in maintaining cancer stem cell properties. Classified as a member of the ATP-binding cassette (ABC) family, ABCA1 was previously identified by methylated DNA immunoprecipitation microarray (mDIP-Chip) to be methylated in ovarian cancer cell lines, A2780 and CP70. By microarray, it was also found to be upregulated in immortalized ovarian surface epithelial (IOSE) cells following TGF-β treatment. Thus, we hypothesized that ABCA1 may be involved in ovarian cancer and its initiation.
We first compared the expression level of ABCA1 in IOSE cells and a panel of ovarian cancer cell lines and found that ABCA1 was expressed in HeyC2, SKOV3, MCP3, and MCP2 ovarian cancer cell lines but downregulated in A2780 and CP70 ovarian cancer cell lines. The reduced expression of ABCA1 in A2780 and CP70 cells was associated with promoter hypermethylation, as demonstrated by bisulfite pyro-sequencing. We also found that knockdown of ABCA1 increased the cholesterol level and promoted cell growth in vitro and in vivo. Further analysis of ABCA1 methylation in 76 ovarian cancer patient samples demonstrated that patients with higher ABCA1 methylation are associated with high stage (P = 0.0131) and grade (P = 0.0137). Kaplan-Meier analysis also found that patients with higher levels of methylation of ABCA1 have shorter overall survival (P = 0.019). Furthermore, tissue microarray using 55 ovarian cancer patient samples revealed that patients with a lower level of ABCA1 expression are associated with shorter progress-free survival (P = 0.038).
ABCA1 may be a tumor suppressor and is hypermethylated in a subset of ovarian cancer patients. Hypermethylation of ABCA1 is associated with poor prognosis in these patients.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-014-0036-2) contains supplementary material, which is available to authorized users.
Ovarian cancer; Epigenetics; ABCA1
QuaCRS (Quality Control for RNA-Seq) is an integrated, simplified quality control (QC) system for RNA-seq data that allows easy execution of several open-source QC tools, aggregation of their output, and the ability to quickly identify quality issues by performing meta-analyses on QC metrics across large numbers of samples in different studies. It comprises two main sections. First is the QC Pack wrapper, which executes three QC tools: FastQC, RNA-SeQC, and selected functions from RSeQC. Combining these three tools into one wrapper provides increased ease of use and provides a much more complete view of sample data quality than any individual tool. Second is the QC database, which displays the resulting metrics in a user-friendly web interface. It was designed to allow users with less computational experience to easily generate and view QC information for their data, to investigate individual samples and aggregate reports of sample groups, and to sort and search samples based on quality. The structure of the QuaCRS database is designed to enable expansion with additional tools and metrics in the future. The source code for not-for-profit use and a fully functional sample user interface with mock data are available at http://bioserv.mps.ohio-state.edu/QuaCRS/.
RNA-seq; quality control; database; FastQC; RNA-SeQC; RSeQC
Estrogen imprinting is used to describe a phenomenon in which early developmental exposure to endocrine disruptors increases breast cancer risk later in adult life. We propose that long-lived, self-regenerating stem and progenitor cells are more susceptible to the exposure injury than terminally differentiated epithelial cells in the breast duct. Mammospheres, containing enriched breast progenitors, were used as an exposure system to simulate this imprinting phenomenon in vitro. Using MeDIP-chip, a methylation microarray screening method, we found that 0.5% (120 loci) of human CpG islands were hypermethylated in epithelial cells derived from estrogen-exposed progenitors compared with the non–estrogen-exposed control cells. This epigenetic event may lead to progressive silencing of tumor suppressor genes, including RUNX3, in these epithelial cells, which also occurred in primary breast tumors. Furthermore, normal tissue in close proximity to the tumor site also displayed RUNX3 hypermethylation, suggesting that this aberrant event occurs in early breast carcinogenesis. The high prevalence of estrogen-induced epigenetic changes in primary tumors and the surrounding histologically normal tissues provides the first empirical link between estrogen injury of breast stem/progenitor cells and carcinogenesis. This finding also offers a mechanistic explanation as to why a tumor suppressor gene, such as RUNX3, can be heritably silenced by epigenetic mechanisms in breast cancer.
This report describes an improved protocol to generate stranded, barcoded RNA-seq libraries to capture the whole transcriptome. By optimizing the use of duplex specific nuclease (DSN) to remove ribosomal RNA reads from stranded barcoded libraries, we demonstrate improved efficiency of multiplexed next generation sequencing (NGS). This approach detects expression profiles of all RNA types, including miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (long non-coding RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA) without the use of gel electrophoresis. The improved protocol generates high quality data that can be used to identify differential expression in known and novel coding and non-coding transcripts, splice variants, mitochondrial genes and SNPs (single nucleotide polymorphisms).
RNA-seq; transcriptome; duplex-specific nuclease; gene expression1
Klebsiella pneumoniae is a bacterial pathogen of worldwide importance and a significant contributor to multiple disease presentations associated with both nosocomial and community acquired disease. ATCC 43816 is a well-studied K. pneumoniae strain which is capable of causing an acute respiratory disease in surrogate animal models. In this study, we performed sequencing of the ATCC 43816 genome to support future efforts characterizing genetic elements required for disease. Furthermore, we performed comparative genetic analyses to the previously sequenced genomes from NTUH-K2044 and MGH 78578 to gain an understanding of the conservation of known virulence determinants amongst the three strains. We found that ATCC 43816 and NTUH-K2044 both possess the known virulence determinant for yersiniabactin, as well as a Type 4 secretion system (T4SS), CRISPR system, and an acetonin catabolism locus, all absent from MGH 78578. While both NTUH-K2044 and MGH 78578 are clinical isolates, little is known about the disease potential of these strains in cell culture and animal models. Thus, we also performed functional analyses in the murine macrophage cell lines RAW264.7 and J774A.1 and found that MGH 78578 (K52 serotype) was internalized at higher levels than ATCC 43816 (K2) and NTUH-K2044 (K1), consistent with previous characterization of the antiphagocytic properties of K1 and K2 serotype capsules. We also examined the three K. pneumoniae strains in a novel BALB/c respiratory disease model and found that ATCC 43816 and NTUH-K2044 are highly virulent (LD50<100 CFU) while MGH 78578 is relatively avirulent.
A causal role of gene amplification in tumorigenesis is well-known, while amplification of DNA regulatory elements as an oncogenic driver remains unclear. In this study, we integrated next-generation sequencing approaches to map distant estrogen response elements (DEREs) that remotely control transcription of target genes through chromatin proximity. Two densely mapped DERE regions located on chromosomes 17q23 and 20q13 were frequently amplified in ERα-positive luminal breast cancer. These aberrantly amplified DEREs deregulated target gene expression potentially linked to cancer development and tamoxifen resistance. Progressive accumulation of DERE copies was observed in normal breast progenitor cells chronically exposed to estrogenic chemicals. These findings may extend to other DNA regulatory elements, the amplification of which can profoundly alter target transcriptome during tumorigenesis.
Rationale: Idiopathic pulmonary fibrosis (IPF) is a disease of progressive lung fibrosis with a high mortality rate. In organ repair and remodeling, epigenetic events are important. MicroRNAs (miRNAs) regulate gene expression post-transcriptionally and can target epigenetic molecules important in DNA methylation. The miR-17∼92 miRNA cluster is critical for lung development and lung epithelial cell homeostasis and is predicted to target fibrotic genes and DNA methyltransferase (DNMT)-1 expression.
Objectives: We investigated the miR-17∼92 cluster expression and its role in regulating DNA methylation events in IPF lung tissue.
Methods: Expression and DNA methylation patterns of miR-17∼92 were determined in human IPF lung tissue and fibroblasts and fibrotic mouse lung tissue. The relationship between the miR-17∼92 cluster and DNMT-1 expression was examined in vitro. Using a murine model of pulmonary fibrosis, we examined the therapeutic potential of the demethylating agent, 5′-aza-2′-deoxycytidine.
Measurements and Main Results: Compared with control samples, miR-17∼92 expression was reduced in lung biopsies and lung fibroblasts from patients with IPF, whereas DNMT-1 expression and methylation of the miR-17∼92 promoter was increased. Several miRNAs from the miR-17∼92 cluster targeted DNMT-1 expression resulting in a negative feedback loop. Similarly, miR-17∼92 expression was reduced in the lungs of bleomycin-treated mice. Treatment with 5′-aza-2′-deoxycytidine in a murine bleomycin-induced pulmonary fibrosis model reduced fibrotic gene and DNMT-1 expression, enhanced miR-17∼92 cluster expression, and attenuated pulmonary fibrosis.
Conclusions: This study provides insight into the pathobiology of IPF and identifies a novel epigenetic feedback loop between miR-17∼92 and DNMT-1 in lung fibrosis.
microRNA; miR-17∼92; pulmonary fibrosis; DNA methylation; DNMT-1
Insects are the most important epidemiological factors for plant virus disease spread, with >75% of viruses being dependent on insects for transmission to new hosts. The black-faced leafhopper (Graminella nigrifrons Forbes) transmits two viruses that use different strategies for transmission: Maize chlorotic dwarf virus (MCDV) which is semi-persistently transmitted and Maize fine streak virus (MFSV) which is persistently and propagatively transmitted. To date, little is known regarding the molecular and cellular mechanisms in insects that regulate the process and efficiency of transmission, or how these mechanisms differ based on virus transmission strategy.
RNA-Seq was used to examine transcript changes in leafhoppers after feeding on MCDV-infected, MFSV-infected and healthy maize for 4 h and 7 d. After sequencing cDNA libraries constructed from whole individuals using Illumina next generation sequencing, the Rnnotator pipeline in Galaxy was used to reassemble the G. nigrifrons transcriptome. Using differential expression analyses, we identified significant changes in transcript abundance in G. nigrifrons. In particular, transcripts implicated in the innate immune response and energy production were more highly expressed in insects fed on virus-infected maize. Leafhoppers fed on MFSV-infected maize also showed an induction of transcripts involved in hemocoel and cell-membrane linked immune responses within four hours of feeding. Patterns of transcript expression were validated for a subset of transcripts by quantitative real-time reverse transcription polymerase chain reaction using RNA samples collected from insects fed on healthy or virus-infected maize for between a 4 h and seven week period.
We expected, and found, changes in transcript expression in G. nigrifrons feeding of maize infected with a virus (MFSV) that also infects the leafhopper, including induction of immune responses in the hemocoel and at the cell membrane. The significant induction of the innate immune system in G. nigrifrons fed on a foregut-borne virus (MCDV) that does not infect leafhoppers was less expected. The changes in transcript accumulation that occur independent of the mode of pathogen transmission could be key for identifying insect factors that disrupt vector-mediated plant virus transmission.
Gene expression; Leafhopper; Nucleorhabdovirus; Waikavirus; Viral transmission pathogen response; Innate immunity
Aberrant DNA methylation of CpG islands, CpG island shores and first exons is known to play a key role in the altered gene expression patterns in all human cancers. To date, a systematic study on the effect of DNA methylation on gene expression using high resolution data has not been reported. In this study, we conducted an integrated analysis of MethylCap-sequencing data and Affymetrix gene expression microarray data for 30 breast cancer cell lines representing different breast tumor phenotypes. As well-developed methods for the integrated analysis do not currently exist, we created a series of four different analysis methods. On the computational side, our goal is to develop methylome data analysis protocols for the integrated analysis of DNA methylation and gene expression data on the genome scale. On the cancer biology side, we present comprehensive genome-wide methylome analysis results for differentially methylated regions and their potential effect on gene expression in 30 breast cancer cell lines representing three molecular phenotypes, luminal, basal A and basal B. Our integrated analysis demonstrates that methylation status of different genomic regions may play a key role in establishing transcriptional patterns in molecular subtypes of human breast cancer.
DNA methylation is an important epigenetic mark and dysregulation of DNA methylation is associated with many diseases including cancer. Advances in next-generation sequencing now allow unbiased methylome profiling of entire patient cohorts, greatly facilitating biomarker discovery and presenting new opportunities to understand the biological mechanisms by which changes in methylation contribute to disease. Enrichment-based sequencing assays such as MethylCap-seq are a cost effective solution for genome-wide determination of methylation status, but the technical reliability of methylation reconstruction from raw sequencing data has not been well characterized.
We analyze three MethylCap-seq data sets and perform two different analyses to assess data quality. First, we investigate how data quality is affected by excluding samples that do not meet quality control cutoff requirements. Second, we consider the effect of additional reads on enrichment score, saturation, and coverage. Lastly, we verify a method for the determination of the global amount of methylation from MethylCap-seq data by comparing to a spiked-in control DNA of known methylation status.
We show that rejection of samples based on our quality control parameters leads to a significant improvement of methylation calling. Additional reads beyond ~13 million unique aligned reads improved coverage, modestly improved saturation, and did not impact enrichment score. Lastly, we find that a global methylation indicator calculated from MethylCap-seq data correlates well with the global methylation level of a sample as obtained from a spike-in DNA of known methylation level.
We show that with appropriate quality control MethylCap-seq is a reliable tool, suitable for cohorts of hundreds of patients, that provides reproducible methylation information on a feature by feature basis as well as information about the global level of methylation.
Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. Currently there is a lack of workflows for efficient analysis of large, MethylCap-seq datasets containing multiple sample groups.
The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. The workflow we describe performs MethylCap-seq experimental Quality Control (QC), sequence file processing and alignment, differential methylation analysis of multiple biological groups, hierarchical clustering, assessment of genome-wide methylation patterns, and preparation of files for data visualization.
Here, we present a scalable, flexible workflow for MethylCap-seq QC, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. We demonstrate the experimental QC procedure with results from a large ovarian cancer study dataset and propose parameters which can identify problematic experiments. Promoter methylation profiling and hierarchical clustering analyses are demonstrated for four groups of acute myeloid leukemia (AML) patients. We propose a Global Methylation Indicator (GMI) function to assess genome-wide changes in methylation patterns between experimental groups. We also show how the workflow facilitates data visualization in a web browser with the application Anno-J.
This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.
The transcriptional response driven by Hypoxia-inducible factor (HIF) is central to the adaptation to oxygen restriction. Despite recent characterization of genome-wide HIF DNA binding locations and hypoxia-regulated transcripts in different cell types, the molecular bases of HIF target selection remain unresolved. Herein, we combined multi-level experimental data and computational predictions to identify sequence motifs that may contribute to HIF target selectivity. We obtained a core set of bona fide HIF binding regions by integrating multiple HIF1 DNA binding and hypoxia expression profiling datasets. This core set exhibits evolutionarily conserved binding regions and is enriched in functional responses to hypoxia. Computational prediction of enriched transcription factor binding sites identified sequence motifs corresponding to several stress-responsive transcription factors, such as activator protein 1 (AP1), cAMP response element-binding (CREB), or CCAAT-enhancer binding protein (CEBP). Experimental validations on HIF-regulated promoters suggest a functional role of the identified motifs in modulating HIF-mediated transcription. Accordingly, transcriptional targets of these factors are over-represented in a sorted list of hypoxia-regulated genes. Altogether, our results implicate cooperativity among stress-responsive transcription factors in fine-tuning the HIF transcriptional response.
Small nuclear RNAs (snRNAs) are essential factors in mRNA splicing. By homozygosity mapping and deep sequencing, we show that a gene encoding U4atac snRNA, a component of the minor U12-dependent spliceosome, is mutated in individuals with microcephalic osteodysplastic primordial dwarfism type I (MOPD I), a severe developmental disorder characterized by extreme intrauterine growth retardation and multiple organ abnormalities. Functional assays show that mutations (30G>A, 51G>A, 55G>A, and 111G>A) associated with MOPD I cause defective U12-dependent splicing. Endogenous U12-dependent but not U2-dependent introns are poorly spliced in MOPD I patient fibroblast cells while introduction of wild type U4atac snRNA into MOPD I cells enhances U12-dependent splicing. These results illustrate the critical role of minor intron splicing in human development.
microcephalic osteodysplastic primordial dwarfism type I; RNU4ATAC; mutation; splicing; snRNA; minor spliceosome
While tumor suppressor genes frequently undergo epigenetic silencing in cancer, how the instructions directing this transcriptional repression are transmitted in cancer cells remain largely unclear. Expression of cyclin-dependent kinase inhibitor 1C (CDKN1C), an imprinted gene on chromosomal band 11 p15.5, is reduced or lost in the majority of breast cancers. Here, we report that CDKN1C is suppressed by estrogen through epigenetic mechanisms involving the chromatin-interacting noncoding RNA KCNQ1OT1 and CCCTC-binding factor (CTCF). Activation of estrogen signaling reduced CDKN1C expression 3-fold (P < 0.001) and established repressive histone modifications at the 5′ regulatory region of the locus. These events were concomitant with induction of KCNQ1OT1 expression as well as increased recruitment of CTCF to both the distal KCNQ1OT1 promoter-associated imprinting control region (ICR) and the CDKN1C locus. Transient depletion of CTCF by small interfering RNA increased CDKN1C expression and significantly reduced the estrogen-mediated repression of CDKN1C. Further studies in breast cancer cell lines indicated that the epigenetic silencing of CDKN1C occurs in part as the result of genetic loss of the inactive methylated 11p15.5 ICR allele (R2 = 0.612, P < 0.001). We also found a novel cis-encoded antisense transcript, CDKN1C-AS, which is induced by estrogen signaling following pharmacologic inhibition of DNA methyltransferase and histone deacetylase activity. Forced expression of CDKN1C-AS was capable of repressing endogenous CDKN1C in vivo. Our findings suggest that in addition to promoter hypermethylation, epigenetic repression of tumor suppressor genes by CTCF and noncoding RNA transcripts could be more common and important than previously understood.
Aberrant TGFβ signaling pathway may alter the expression of down-stream targets and promotes ovarian carcinogenesis. However, the mechanism of this impairment is not fully understood. Our previous study identified RunX1T1 as a putative SMAD4 target in an immortalized ovarian surface epithelial cell line, IOSE. In this study, we report that transcription of RunX1T1 was confirmed to be positively regulated by SMAD4 in IOSE cells and epigenetically silenced in a panel of ovarian cancer cell lines by promoter hypermethylation and histone methylation at H3 lysine 9. SMAD4 depletion increased repressive histone modifications of RunX1T1 promoter without affecting promoter methylation in IOSE cells. Epigenetic treatment can restore RunX1T1 expression by reversing its epigenetic status in MCP 3 ovarian cancer cells. When transiently treated with a demethylating agent, the expression of RunX1T1 was partially restored in MCP 3 cells, but gradual re-silencing through promoter re-methylation was observed after the treatment. Interestingly, SMAD4 knockdown accelerated this re-silencing process, suggesting that normal TGFβ signaling is essential for the maintenance of RunX1T1 expression. In vivo analysis confirmed that hypermethylation of RunX1T1 was detected in 35.7% (34/95) of ovarian tumors with high clinical stages (p = 0.035) and in 83% (5/6) of primary ovarian cancer-initiating cells. Additionally, concurrent methylation of RunX1T1 and another SMAD4 target, FBXO32 which was previously found to be hypermethylated in ovarian cancer was observed in this same sample cohort (p < 0.05). Restoration of RunX1T1 inhibited cancer cell growth. Taken together, dysregulated TGFβ/SMAD4 signaling may lead to epigenetic silencing of a putative tumor suppressor, RunX1T1, during ovarian carcinogenesis.
ovarian cancer; epigenetics; TGFβ; RunX1T1
Alterations of DNA methylation play an important role in gliomas. In a genome-wide screen, we identified a CpG-rich fragment within the 5′ region of the tumor necrosis factor receptor superfamily, member 11A gene (TNFRSF11A) that showed de novo methylation in gliomas. TNFRSF11A, also known as receptor activator of NF-κB (RANK), activates several signaling pathways, such as NF-κB, JNK, ERK, p38α, and Akt/PKB. Using pyrosequencing, we detected RANK/TNFRSF11A promoter methylation in 8 (57.1%) of 14 diffuse astrocytomas, 17 (77.3%) of 22 anaplastic astrocytomas, 101 (84.2%) of 120 glioblastomas, 6 (100%) of 6 glioma cell lines, and 7 (100%) of 7 glioma stem cell-enriched glioblastoma primary cultures but not in four normal white matter tissue samples. Treatment of glioma cell lines with the demethylating agent 5-aza-2′-deoxycytidine significantly reduced the methylation level and resulted in increased RANK/TNFRSF11A mRNA expression. Overexpression of RANK/TNFRSF11A in glioblastoma cell lines leads to a significant reduction in focus formation and elevated apoptotic activity after flow cytometric analysis. Reporter assay studies of transfected glioma cells supported these results by showing the activation of signaling pathways associated with regulation of apoptosis. We conclude that RANK/TNFRSF11A is a novel and frequent target for de novo methylation in gliomas, which affects apoptotic activity and focus formation thereby contributing to the molecular pathogenesis of gliomas.
Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. Here, we present a scalable, flexible workflow for MethylCap-seq Quality Control, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.
next generation sequencing; DNA methylation; epigenetics; cancer; data analysis; data visualization
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina’s Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
DNA methylation is a hallmark in a subset of right-sided colorectal cancers. Methylation-based screening may improve prevention and survival rate for this type of cancer, which is often clinically asymptomatic in the early stages. We aimed to discover prognostic or diagnostic biomarkers for colon cancer by comparing DNA methylation profiles of right-sided colon tumours and paired normal colon mucosa using an 8.5 k CpG island microarray. We identified a diagnostic CpG-rich region, located in the first intron of the protein-tyrosine phosphatase gamma gene (PTPRG) gene, with altered methylation already in the adenoma stage, that is, before the carcinoma transition. Validation of this region in an additional cohort of 103 sporadic colorectal tumours and 58 paired normal mucosa tissue samples showed 94% sensitivity and 96% specificity. Interestingly, comparable results were obtained when screening a cohort of Lynch syndrome-associated cancers. Functional studies showed that PTPRG intron 1 methylation did not directly affect PTPRG expression, however, the methylated region overlapped with a binding site of the insulator protein CTCF. Chromatin immunoprecipitation (ChIP) showed that methylation of the locus was associated with absence of CTCF binding. Methylation-associated changes in CTCF binding to PTPRG intron 1 could have implications on tumour gene expression by enhancer blocking, chromosome loop formation or abrogation of its insulator function. The high sensitivity and specificity for the PTPRG intron 1 methylation in both sporadic and hereditary colon cancers support biomarker potential for early detection of colon cancer.
PTPRG; colorectal cancer; CTCF; DNA methylation; Lynch syndrome
DNA methylation plays a very important role in the silencing of tumor suppressor genes in various tumor types. In order to gain a genome-wide understanding of how changes in methylation affect tumor growth, the differential methylation hybridization (DMH) protocol has been developed and large amounts of DMH microarray data have been generated. However, it is still unclear how to preprocess this type of microarray data and how different background correction and normalization methods used for two-color gene expression arrays perform for the methylation microarray data. In this paper, we demonstrate our discovery of a set of internal control probes that have log ratios (M) theoretically equal to zero according to this DMH protocol. With the aid of this set of control probes, we propose two LOESS (or LOWESS, locally weighted scatter-plot smoothing) normalization methods that are novel and unique for DMH microarray data. Combining with other normalization methods (global LOESS and no normalization), we compare four normalization methods. In addition, we compare five different background correction methods.
We study 20 different preprocessing methods, which are the combination of five background correction methods and four normalization methods. In order to compare these 20 methods, we evaluate their performance of identifying known methylated and un-methylated housekeeping genes based on two statistics. Comparison details are illustrated using breast cancer cell line and ovarian cancer patient methylation microarray data. Our comparison results show that different background correction methods perform similarly; however, four normalization methods perform very differently. In particular, all three different LOESS normalization methods perform better than the one without any normalization.
It is necessary to do within-array normalization, and the two LOESS normalization methods based on specific DMH internal control probes produce more stable and relatively better results than the global LOESS normalization method.
Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood.
We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylations.
We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.
We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.
In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.
The Cdc42-interacting protein-4, Trip10 (also known as CIP4), is a multi-domain adaptor protein involved in diverse cellular processes, which functions in a tissue-specific and cell lineage-specific manner. We previously found that Trip10 is highly expressed in estrogen receptor-expressing (ER+) breast cancer cells. Estrogen receptor depletion reduced Trip10 expression by progressively increasing DNA methylation. We hypothesized that Trip10 functions as a tumor suppressor and may be involved in the malignancy of ER-negative (ER-) breast cancer. To test this hypothesis and evaluate whether Trip10 is epigenetically regulated by DNA methylation in other cancers, we evaluated DNA methylation of Trip10 in liver cancer, brain tumor, ovarian cancer, and breast cancer.
We applied methylation-specific polymerase chain reaction and bisulfite sequencing to determine the DNA methylation of Trip10 in various cancer cell lines and tumor specimens. We also overexpressed Trip10 to observe its effect on colony formation and in vivo tumorigenesis.
We found that Trip10 is hypermethylated in brain tumor and breast cancer, but hypomethylated in liver cancer. Overexpressed Trip10 was associated with endogenous Cdc42 and huntingtin in IMR-32 brain tumor cells and CP70 ovarian cancer cells. However, overexpression of Trip10 promoted colony formation in IMR-32 cells and tumorigenesis in mice inoculated with IMR-32 cells, whereas overexpressed Trip10 substantially suppressed colony formation in CP70 cells and tumorigenesis in mice inoculated with CP70 cells.
Trip10 regulates cancer cell growth and death in a cancer type-specific manner. Differential DNA methylation of Trip10 can either promote cell survival or cell death in a cell type-dependent manner.