Monomeric CRISPR-Cas9 nucleases are widely used for targeted genome editing but can induce unwanted off-target mutations with high frequencies. Here we describe dimeric RNA-guided FokI Nucleases (RFNs) that recognize extended sequences and can edit endogenous genes with high efficiencies in human cells. The cleavage activity of an RFN depends strictly on the binding of two guide RNAs (gRNAs) to DNA with a defined spacing and orientation and therefore show improved specificities relative to wild-type Cas9 monomers. Importantly, direct comparisons show that RFNs guided by a single gRNA generally induce lower levels of unwanted mutations than matched monomeric Cas9 nickases. In addition, we describe a simple method for expressing multiple gRNAs bearing any 5′ end nucleotide, which gives dimeric RFNs a broad targeting range. RFNs combine the ease of RNA-based targeting with the specificity enhancement inherent to dimerization and are likely to be useful in applications that require highly precise genome editing.
Reverse-engineering gene regulatory networks from expression data is difficult, especially without temporal measurements or interventional experiments. In particular, the causal direction of an edge is generally not statistically identifiable, i.e., cannot be inferred as a statistical parameter, even from an unlimited amount of non-time series observational mRNA expression data. Some additional evidence is required and high-throughput methylation data can viewed as a natural multifactorial gene perturbation experiment.
We introduce IDEM (Identifying Direction from Expression and Methylation), a method for identifying the causal direction of edges by combining DNA methylation and mRNA transcription data. We describe the circumstances under which edge directions become identifiable and experiments with both real and synthetic data demonstrate that the accuracy of IDEM for inferring both edge placement and edge direction in gene regulatory networks is significantly improved relative to other methods.
Reverse-engineering directed gene regulatory networks from static observational data becomes feasible by exploiting the context provided by high-throughput DNA methylation data.
An implementation of the algorithm described is available at http://code.google.com/p/idem/.
Gene regulation; Methylation; Microarrays; Bayesian networks
Estimation of pathogen-specific causes of child diarrhea deaths is needed to guide vaccine development and other prevention strategies. We did a systematic review of articles published between 1990 and 2011 reporting at least one of 13 pathogens in children <5 years of age hospitalized with diarrhea. We included 2011 rotavirus data from the Rotavirus Surveillance Network coordinated by WHO. We excluded studies conducted during diarrhea outbreaks that did not discriminate between inpatient and outpatient cases, reporting nosocomial infections, those conducted in special populations, not done with adequate methods, and rotavirus studies in countries where the rotavirus vaccine was used. Age-adjusted median proportions for each pathogen were calculated and applied to 712 000 deaths due to diarrhea in children under 5 years for 2011, assuming that those observed among children hospitalized for diarrhea represent those causing child diarrhea deaths. 163 articles and WHO studies done in 31 countries were selected representing 286 inpatient studies. Studies seeking only one pathogen found higher proportions for some pathogens than studies seeking multiple pathogens (e.g. 39% rotavirus in 180 single-pathogen studies vs. 20% in 24 studies with 5–13 pathogens, p<0·0001). The percentage of episodes for which no pathogen could be identified was estimated to be 34%; the total of all age-adjusted percentages for pathogens and no-pathogen cases was 138%. Adjusting all proportions, including unknowns, to add to 100%, we estimated that rotavirus caused 197 000 [Uncertainty range (UR) 110 000–295 000], enteropathogenic E. coli 79 000 (UR 31 000–146 000), calicivirus 71 000 (UR 39 000–113 000), and enterotoxigenic E. coli 42 000 (UR 20 000–76 000) deaths. Rotavirus, calicivirus, enteropathogenic and enterotoxigenic E. coli cause more than half of all diarrheal deaths in children <5 years in the world.
Epigenetic mechanisms integrate genetic and environmental causes of disease. Comprehensive genome-wide analyses of epigenetic modifications have not demonstrated robust association with common diseases. Using Illumina HumanMethylation450 arrays on 354 ACPA positive rheumatoid arthritis (RA) cases and 337 controls, we identified two clusters within the MHC region whose differential methylation potentially mediates genetic risk for RA. To reduce confounding hampering previous epigenome-wide studies, we corrected for cellular heterogeneity by estimating and adjusting for cell-type proportions and used mediation analysis to filter out associations likely consequential to disease. Four CpGs also showed association between genotype and variance of methylation in addition to mean. The associations for both clusters replicated at least one CpG (p<0.01), with the rest showing suggestive association, in monocytes in an independent 12 cases and 12 controls. Thus, DNA methylation is a potential mediator of genetic risk.
Human cancers nearly ubiquitously harbor epigenetic alterations. While such alterations in epigenetic marks, including DNA methylation, are potentially heritable, they can also be dynamically altered. Given this potential for plasticity, the degree to which epigenetic changes can be subject to selection and act as drivers of neoplasia has been questioned. Here, we carried out genome-scale analyses of DNA methylation alterations in lethal metastatic prostate cancer and created DNA methylation “cityscape” plots to visualize these complex data. We show that somatic DNA methylation alterations, despite showing marked inter-individual heterogeneity among men with lethal metastatic prostate cancer, were maintained across all metastases within the same individual. The overall extent of maintenance in DNA methylation changes was comparable to that of genetic copy number alterations. Regions that were frequently hypermethylated across individuals were markedly enriched for cancer and development/differentiation related genes. Additionally, regions exhibiting high consistency of hypermethylation across metastases within individuals, even if variably hypermethylated across individuals, showed enrichment of cancer-related genes. Interestingly, whereas some regions showed intra-individual metastatic tumor heterogeneity in promoter methylation, such methylation alterations were generally not correlated with gene expression. This was despite a general tendency for promoter methylation patterns to be strongly correlated with gene expression, particularly at regions that were variably methylated across individuals. These findings suggest that DNA methylation alterations have the potential for producing selectable driver events in carcinogenesis and disease progression and highlight the possibility of targeting such epigenome alterations for development of longitudinal markers and therapeutic strategies.
In honeybee societies, distinct caste phenotypes are created from the same genotype, suggesting a role for epigenetics in deriving these behaviorally different phenotypes. We found no differences in DNA methylation between irreversible worker/queen castes, but substantial differences between nurses and forager subcastes. Reverting foragers back to nurses reestablished methylation levels for a majority of genes and provided the first evidence in any organism of reversible epigenetic changes associated with behavior.
Brain cellular heterogeneity may bias cell type specific DNA methylation patterns, influencing findings in psychiatric epigenetic studies. We performed fluorescence activated cell sorting (FACS) of neuronal nuclei and Illumina HM450 DNA methylation profiling in post mortem frontal cortex of 29 major depression and 29 matched controls. We identify genomic features and ontologies enriched for cell type specific epigenetic variation. Using the top cell epigenotype specific (CETS) marks, we generated a publically available R package, “CETS,” capable of quantifying neuronal proportions and generating in silico neuronal profiles from DNA methylation data. We demonstrate a significant overlap in major depression DNA methylation associations between FACS separated and CETS model generated neuronal profiles relative to bulk profiles. CETS derived neuronal proportions correlated significantly with age in the frontal cortex and cerebellum and accounted for epigenetic variation between brain regions. CETS based control of cellular heterogeneity will enable more robust hypothesis testing in the brain.
DNA methylation; neurons; glia; fluorescence activated cell sorting; epigenetics; cellular heterogeneity; microarray; age; brain region
Background Gestational age at birth strongly predicts neonatal, adolescent and adult morbidity and mortality through mostly unknown mechanisms. Identification of specific genes that are undergoing regulatory change prior to birth, such as through changes in DNA methylation, would increase our understanding of developmental changes occurring during the third trimester and consequences of pre-term birth (PTB).
Methods We performed a genome-wide analysis of DNA methylation (using microarrays, specifically CHARM 2.0) in 141 newborns collected in Baltimore, MD, using novel statistical methodology to identify genomic regions associated with gestational age at birth. Bisulphite pyrosequencing was used to validate significant differentially methylated regions (DMRs), and real-time PCR was performed to assess functional significance of differential methylation in a subset of newborns.
Results We identified three DMRs at genome-wide significance levels adjacent to the NFIX, RAPGEF2 and MSRB3 genes. All three regions were validated by pyrosequencing, and RAGPEF2 also showed an inverse correlation between DNA methylation levels and gene expression levels. Although the three DMRs appear very dynamic with gestational age in our newborn sample, adult DNA methylation levels at these regions are stable and of equal or greater magnitude than the oldest neonate, directionally consistent with the gestational age results.
Conclusions We have identified three differentially methylated regions associated with gestational age at birth. All three nearby genes play important roles in the development of several organs, including skeletal muscle, brain and haematopoietic system. Therefore, they may provide initial insight into the basis of PTB's negative health outcomes. The genome-wide custom DNA methylation array technology and novel statistical methods employed in this study could constitute a model for epidemiologic studies of epigenetic variation.
Epigenetic epidemiology; differentially methylated regions; pre-term birth; gestational age; genome-wide DNA methylation
Comprehensive high-throughput arrays for relative methylation (CHARM) was recently developed as an experimental platform and analytic approach to assess DNA methylation (DNAm) at a genome-wide level. Its initial implementation was for human and mouse. We adapted it for rat and sought to examine DNAm differences across tissues and brain regions in this model organism. We extracted DNA from liver, spleen and three brain regions: cortex, hippocampus and hypothalamus from adult Sprague Dawley rats. DNA was digested with McrBC, and the resulting methyl-depleted fraction was hybridized to the rat CHARM array along with a mock-treated fraction. Differentially methylated regions (DMRs) between tissue types were detected using normalized methylation log-ratios. In validating 24 of the most significant DMRs by bisulfite pyrosequencing, we detected large mean differences in DNAm, ranging from 33–59%, among the most significant DMRs in the across-tissue comparisons. The comparable figures for the hippocampus vs. hypothalamus DMRs were 14–40%, for the cortex vs. hippocampus DMRs, 12–29%, and for the cortex vs. hypothalamus DMRs, 5–35%, with a correlation of r2 = 0.92 between the methylation differences in 24 DMRs predicted by CHARM and those validated by bisulfite pyrosequencing. Our adaptation of the CHARM array for the rat genome yielded highly robust results that demonstrate the value of this method in detecting substantial DNAm differences between tissues and across different brain regions. This platform should prove valuable in future studies aimed at examining DNAm differences in particular brain regions of rats exposed to environmental stimuli with potential epigenetic consequences.
epigenetics; DNA methylation; methylation array; genome-wide; rat; brain
We compared bona-fide human induced pluripotent stem cells (iPSC) derived from umbilical cord blood (CB) and neonatal keratinocytes (K). As a consequence of both incomplete erasure of tissue-specific methylation and aberrant de novo methylation, CB-iPSC and K-iPSC are distinct in genome-wide DNA methylation profiles and differentiation potential. Extended passage of some iPSC clones in culture didn't improve their epigenetic resemblance to ESC, implying that some human iPSC retain a residual “epigenetic memory” of their tissue of origin.
While genome-wide association studies are ongoing to identify sequence variation influencing susceptibility to major depressive disorder (MDD), epigenetic marks, such as DNA methylation, which can be influenced by environment, might also play a role. Here we present the first genome-wide DNA methylation (DNAm) scan in MDD. We compared 39 postmortem frontal cortex MDD samples to 26 controls. DNA was hybridized to our Comprehensive High-throughput Arrays for Relative Methylation (CHARM) platform, covering 3.5 million CpGs. CHARM identified 224 candidate regions with DNAm differences >10%. These regions are highly enriched for neuronal growth and development genes. Ten of 17 regions for which validation was attempted showed true DNAm differences; the greatest were in PRIMA1, with 12–15% increased DNAm in MDD (p = 0.0002–0.0003), and a concomitant decrease in gene expression. These results must be considered pilot data, however, as we could only test replication in a small number of additional brain samples (n = 16), which showed no significant difference in PRIMA1. Because PRIMA1 anchors acetylcholinesterase in neuronal membranes, decreased expression could result in decreased enzyme function and increased cholinergic transmission, consistent with a role in MDD. We observed decreased immunoreactivity for acetylcholinesterase in MDD brain with increased PRIMA1 DNAm, non-significant at p = 0.08.
While we cannot draw firm conclusions about PRIMA1 DNAm in MDD, the involvement of neuronal development genes across the set showing differential methylation suggests a role for epigenetics in the illness. Further studies using limbic system brain regions might shed additional light on this role.
DNA methylation is a key regulator of gene function in a multitude of both normal and abnormal biological processes, but tools to elucidate its roles on a genome-wide scale are still in their infancy. Methylation sensitive restriction enzymes and microarrays provide a potential high-throughput, low-cost platform to allow methylation profiling. However, accurate absolute methylation estimates have been elusive due to systematic errors and unwanted variability. Previous microarray preprocessing procedures, mostly developed for expression arrays, fail to adequately normalize methylation-related data since they rely on key assumptions that are violated in the case of DNA methylation. We develop a normalization strategy tailored to DNA methylation data and an empirical Bayes percentage methylation estimator that together yield accurate absolute methylation estimates that can be compared across samples. We illustrate the method on data generated to detect methylation differences between tissues and between normal and tumor colon samples.
DNA methylation; Epigenetics; Microarray
Diarrhea is recognized as a leading cause of morbidity and mortality among children under 5 years of age in low- and middle-income countries yet updated estimates of diarrhea incidence by age for these countries are greatly needed. We conducted a systematic literature review to identify cohort studies that sought to quantify diarrhea incidence among any age group of children 0-59 mo of age.
We used the Expectation-Maximization algorithm as a part of a two-stage regression model to handle diverse age data and overall incidence rate variation by study to generate country specific incidence rates for low- and middle-income countries for 1990 and 2010. We then calculated regional incidence rates and uncertainty ranges using the bootstrap method, and estimated the total number of episodes for children 0-59 mo of age in 1990 and 2010.
We estimate that incidence has declined from 3.4 episodes/child year in 1990 to 2.9 episodes/child year in 2010. As was the case previously, incidence rates are highest among infants 6-11 mo of age; 4.5 episodes/child year in 2010. Among these 139 countries there were nearly 1.9 billion episodes of childhood diarrhea in 1990 and nearly 1.7 billion episodes in 2010.
Although our results indicate that diarrhea incidence rates may be declining slightly, the total burden on the health of each child due to multiple episodes per year is tremendous and additional funds are needed to improve both prevention and treatment practices in low- and middle-income countries.
Diarrhea remains one of the leading causes of morbidity and mortality among children under 5 years of age, but in many low and middle-income countries where vital registration data are lacking, updated estimates with regard to the proportion of deaths attributable to diarrhea are needed.
We conducted a systematic literature review to identify studies reporting diarrhea proportionate mortality for children 1–59 mo of age published between 1980 and 2009. Using the published proportionate mortality estimates and country level covariates we constructed a logistic regression model to estimate country and regional level proportionate mortality and estimated uncertainty bounds using Monte-Carlo simulations.
We identified more than 90 verbal autopsy studies from around the world to contribute data to a single-cause model. We estimated diarrhea proportionate mortality for 84 countries in 6 regions and found diarrhea to account for between 10.0% of deaths in the Americas to 31.3% of deaths in the South-east Asian region.
Diarrhea remains a leading cause of death for children 1–59 mo of age. Published literature can be used to create a single-cause mortality disease model to estimate mortality for countries lacking vital registration data.
Normalization has been recognized as a necessary preprocessing step in a variety of high-throughput biotechnologies. A number of normalization methods have been developed specifically for microarrays, some general and others tailored for certain experimental designs. All methods rely on assumptions about data characteristics that are expected to stay constant across samples, although some make it more explicit than others. Most methods make assumptions that certain quantities related to the biological signal of interest stay the same; this is reasonable for many experiments but usually not verifiable. Recently, several platforms have begun to include a large number of negative control probes that nonetheless cover nearly the entire range of the measured signal intensity. Using these probes as a normalization basis makes it possible to normalize without making assumptions about the behavior of the biological signal. We present a subset quantile normalization (SQN) procedure that normalizes based on the distribution of non-specific control features, without restriction on the behavior of specific signals. We illustrate the performance of this method using three different platforms and experimental settings. Compared to two other leading nonlinear normalization procedures, the SQN method preserves more biological variation after normalization while reducing the noise observed on control features. Although the illustration datasets are from microarray experiments, this method is general for all high throughput technologies that include a large set of control features that have constant expectations across samples. It does not require an equal number of features in all samples and tolerates missing data. Supplementary Material is available online at www.liebertonline.com.
DNA arrays; functional genomics; genes chips; gene expression
DNA double strand breaks (DSB) can lead to development of genomic rearrangements, which are hallmarks of cancer. TMPRSS2-ERG gene fusions in prostate cancer (PCa) are among the most common genomic rearrangements observed in human cancer. We show that androgen signaling promotes co-recruitment of androgen receptor (AR) and topoisomerase II beta (TOP2B) to sites of TMPRSS2-ERG genomic breakpoints, triggering recombinogenic TOP2B-mediated DSB. Furthermore, androgen stimulation resulted in de novo production of TMPRSS2-ERG fusion transcripts in a process requiring TOP2B and components of DSB repair machinery. Finally, unlike normal prostate epithelium, prostatic intraepithelial neoplasia (PIN) cells showed strong co-expression of AR and TOP2B. These findings implicate androgen-induced TOP2B-mediated DSB in generating TMPRSS2-ERG rearrangements.
The epigenome consists of non–sequence-based modifications, such as DNA methylation, that are heritable during cell division and that may affect normal phenotypes and predisposition to disease. Here, we have performed an unbiased genome-scale analysis of ~4 million CpG sites in 74 individuals with comprehensive array-based relative methylation (CHARM) analysis. We found 227 regions that showed extreme interindividual variability [variably methylated regions (VMRs)] across the genome, which are enriched for developmental genes based on Gene Ontology analysis. Furthermore, half of these VMRs were stable within individuals over an average of 11 years, and these VMRs defined a personalized epigenomic signature. Four of these VMRs showed covariation with body mass index consistently at two study visits and were located in or near genes previously implicated in regulating body weight or diabetes. This work suggests an epigenetic strategy for identifying patients at risk of common disease.
DNA methylation has been linked to genome regulation and dysregulation in health and disease respectively, and methods for characterizing genomic DNA methylation patterns are rapidly emerging. We have developed/refined methods for enrichment of methylated genomic fragments using the methyl-binding domain of the human MBD2 protein (MBD2-MBD) followed by analysis with high-density tiling microarrays. This MBD-chip approach was used to characterize DNA methylation patterns across all non-repetitive sequences of human chromosomes 21 and 22 at high-resolution in normal and malignant prostate cells.
Examining this data using computational methods that were designed specifically for DNA methylation tiling array data revealed widespread methylation of both gene promoter and non-promoter regions in cancer and normal cells. In addition to identifying several novel cancer hypermethylated 5' gene upstream regions that mediated epigenetic gene silencing, we also found several hypermethylated 3' gene downstream, intragenic and intergenic regions. The hypermethylated intragenic regions were highly enriched for overlap with intron-exon boundaries, suggesting a possible role in regulation of alternative transcriptional start sites, exon usage and/or splicing. The hypermethylated intergenic regions showed significant enrichment for conservation across vertebrate species. A sampling of these newly identified promoter (ADAMTS1 and SCARF2 genes) and non-promoter (downstream or within DSCR9, C21orf57 and HLCS genes) hypermethylated regions were effective in distinguishing malignant from normal prostate tissues and/or cell lines.
Comparison of chromosome-wide DNA methylation patterns in normal and malignant prostate cells revealed significant methylation of gene-proximal and conserved intergenic sequences. Such analyses can be easily extended for genome-wide methylation analysis in health and disease.
DNA methylation; prostate cancer; tiling microarray; epigenetics; methylated DNA binding domain; MBD-chip; ADAMTS1; SCARF2; DSCR9; HLCS
Epigenetic modifications must underlie lineage-specific differentiation as terminally differentiated cells express tissue-specific genes, but their DNA sequence is unchanged. Hematopoiesis provides a well-defined model to study epigenetic modifications during cell-fate decisions, as multipotent progenitors (MPPs) differentiate into progressively restricted myeloid or lymphoid progenitors. While DNA methylation is critical for myeloid versus lymphoid differentiation, as demonstrated by the myeloerythroid bias in Dnmt1 hypomorphs1, a comprehensive DNA methylation map of hematopoietic progenitors, or of any multipotent/oligopotent lineage, does not exist. Here we examined 4.6 million CpG sites throughout the genome for MPPs, common lymphoid progenitors (CLPs), common myeloid progenitors (CMPs), granulocyte/macrophage progenitors (GMPs), and thymocyte progenitors (DN1, DN2, DN3). Dramatic epigenetic plasticity accompanied both lymphoid and myeloid restriction. Myeloid commitment involved less global DNA methylation than lymphoid commitment, supported functionally by myeloid skewing of progenitors following treatment with a DNA methyltransferase inhibitor. Differential DNA methylation correlated with gene expression more strongly at CpG island shores than CpG islands. Many examples of genes and pathways not previously known to be involved in choice between lymphoid/myeloid differentiation have been identified, such as Arl4c and Jdp2. Several transcription factors, including Meis1, were methylated and silenced during differentiation, suggesting a role in maintaining an undifferentiated state. Additionally, epigenetic modification of modifiers of the epigenome appears to be important in hematopoietic differentiation. Our results directly demonstrate that modulation of DNA methylation occurs during lineage-specific differentiation and defines a comprehensive map of the methylation and transcriptional changes that accompany myeloid versus lymphoid fate decisions.
Induced pluripotent stem (iPS) cells are derived by epigenetic reprogramming, but their DNA methylation patterns have not yet been analyzed on a genome-wide scale. Here, we find substantial hypermethylation and hypomethylation of cytosine-phosphate-guanine (CpG) island shores in nine human iPS cell lines as compared to their parental fibroblasts. The differentially methylated regions (DMRs) in the reprogrammed cells (denoted R-DMRs) were significantly enriched in tissue-specific (T-DMRs; 2.6-fold, P < 10−4) and cancer-specific DMRs (C-DMRs; 3.6-fold, P < 10−4). Notably, even though the iPS cells are derived from fibroblasts, their R-DMRs can distinguish between normal brain, liver and spleen cells and between colon cancer and normal colon cells. Thus, many DMRs are broadly involved in tissue differentiation, epigenetic reprogramming and cancer. We observed colocalization of hypomethylated R-DMRs with hypermethylated C-DMRs and bivalent chromatin marks, and colocalization of hypermethylated R-DMRs with hypomethylated C-DMRs and the absence of bivalent marks, suggesting two mechanisms for epigenetic reprogramming in iPS cells and cancer.
Microarray gene expression time-course experiments provide the opportunity to observe the evolution of transcriptional programs that cells use to respond to internal and external stimuli. Most commonly used methods for identifying differentially expressed genes treat each time point as independent and ignore important correlations, including those within samples and between sampling times. Therefore they do not make full use of the information intrinsic to the data, leading to a loss of power.
We present a flexible random-effects model that takes such correlations into account, improving our ability to detect genes that have sustained differential expression over more than one time point. By modeling the joint distribution of the samples that have been profiled across all time points, we gain sensitivity compared to a marginal analysis that examines each time point in isolation. We assign each gene a probability of differential expression using an empirical Bayes approach that reduces the effective number of parameters to be estimated.
Based on results from theory, simulated data, and application to the genomic data presented here, we show that BETR has increased power to detect subtle differential expression in time-series data. The open-source R package betr is available through Bioconductor. BETR has also been incorporated in the freely-available, open-source MeV software tool available from http://www.tm4.org/mev.html.