Search tips
Search criteria

Results 1-25 (95)

Clipboard (0)
Year of Publication
more »
1.  Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies 
Genome Biology  2010;11(1):R8.
Twelve cDNA libraries from two species of catfish have been sequenced, resulting in the generation of nearly 500,000 ESTs.
Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification.
A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis.
This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.
PMCID: PMC2847720  PMID: 20096101
2.  Differential DNA methylation in discrete developmental stages of the parasitic nematode Trichinella spiralis 
Genome Biology  2012;13(10):R100.
DNA methylation plays an essential role in regulating gene expression under a variety of conditions and it has therefore been hypothesized to underlie the transitions between life cycle stages in parasitic nematodes. So far, however, 5'-cytosine methylation has not been detected during any developmental stage of the nematode Caenorhabditis elegans. Given the new availability of high-resolution methylation detection methods, an investigation of life cycle methylation in a parasitic nematode can now be carried out.
Here, using MethylC-seq, we present the first study to confirm the existence of DNA methylation in the parasitic nematode Trichinella spiralis, and we characterize the methylomes of the three life-cycle stages of this food-borne infectious human pathogen. We observe a drastic increase in DNA methylation during the transition from the new born to mature stage, and we further identify parasitism-related genes that show changes in DNA methylation status between life cycle stages.
Our data contribute to the understanding of the developmental changes that occur in an important human parasite, and raises the possibility that targeting DNA methylation processes may be a useful strategy in developing therapeutics to impede infection. In addition, our conclusion that DNA methylation is a mechanism for life cycle transition in T. spiralis prompts the question of whether this may also be the case in any other metazoans. Finally, our work constitutes the first report, to our knowledge, of DNA methylation in a nematode, prompting a re-evaluation of phyla in which this epigenetic mark was thought to be absent.
PMCID: PMC4053732  PMID: 23075480
3.  MM-ChIP enables integrative analysis of cross-platform and between-laboratory ChIP-chip or ChIP-seq data 
Genome Biology  2011;12(2):R11.
The ChIP-chip and ChIP-seq techniques enable genome-wide mapping of in vivo protein-DNA interactions and chromatin states. The cross-platform and between-laboratory variation poses a challenge to the comparison and integration of results from different ChIP experiments. We describe a novel method, MM-ChIP, which integrates information from cross-platform and between-laboratory ChIP-chip or ChIP-seq datasets. It improves both the sensitivity and the specificity of detecting ChIP-enriched regions, and is a useful meta-analysis tool for driving discoveries from multiple data sources.
PMCID: PMC3188793  PMID: 21284836
4.  Integrated analyses of DNA methylation and hydroxymethylation reveal tumor suppressive roles of ECM1, ATF5, and EOMES in human hepatocellular carcinoma 
Genome Biology  2014;15(12):533.
Differences in 5-hydroxymethylcytosine, 5hmC, distributions may complicate previous observations of abnormal cytosine methylation statuses that are used for the identification of new tumor suppressor gene candidates that are relevant to human hepatocarcinogenesis. The simultaneous detection of 5-methylcytosine and 5-hydroxymethylcytosine is likely to stimulate the discovery of aberrantly methylated genes with increased accuracy in human hepatocellular carcinoma.
Here, we performed ultra-performance liquid chromatography/tandem mass spectrometry and single-base high-throughput sequencing, Hydroxymethylation and Methylation Sensitive Tag sequencing, HMST-seq, to synchronously measure these two modifications in human hepatocellular carcinoma samples. After identification of differentially methylated and hydroxymethylated genes in human hepatocellular carcinoma, we integrate DNA copy-number alterations, as determined using array-based comparative genomic hybridization data, with gene expression to identify genes that are potentially silenced by promoter hypermethylation.
We report a high enrichment of genes with epigenetic aberrations in cancer signaling pathways. Six genes were selected as tumor suppressor gene candidates, among which, ECM1, ATF5 and EOMES are confirmed via siRNA experiments to have potential anti-cancer functions.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0533-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4253612  PMID: 25517360
5.  Genomes of the rice pest brown planthopper and its endosymbionts reveal complex complementary contributions for host adaptation 
Genome Biology  2014;15(12):521.
The brown planthopper, Nilaparvata lugens, the most destructive pest of rice, is a typical monophagous herbivore that feeds exclusively on rice sap, which migrates over long distances. Outbreaks of it have re-occurred approximately every three years in Asia. It has also been used as a model system for ecological studies and for developing effective pest management. To better understand how a monophagous sap-sucking arthropod herbivore has adapted to its exclusive host selection and to provide insights to improve pest control, we analyzed the genomes of the brown planthopper and its two endosymbionts.
We describe the 1.14 gigabase planthopper draft genome and the genomes of two microbial endosymbionts that permit the planthopper to forage exclusively on rice fields. Only 40.8% of the 27,571 identified Nilaparvata protein coding genes have detectable shared homology with the proteomes of the other 14 arthropods included in this study, reflecting large-scale gene losses including in evolutionarily conserved gene families and biochemical pathways. These unique genomic features are functionally associated with the animal’s exclusive plant host selection. Genes missing from the insect in conserved biochemical pathways that are essential for its survival on the nutritionally imbalanced sap diet are present in the genomes of its microbial endosymbionts, which have evolved to complement the mutualistic nutritional needs of the host.
Our study reveals a series of complex adaptations of the brown planthopper involving a variety of biological processes, that result in its highly destructive impact on the exclusive host rice. All these findings highlight potential directions for effective pest control of the planthopper.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0521-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4269174  PMID: 25609551
6.  Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis 
Genome Biology  2014;15(2):R39.
Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored.
Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame.
As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame.
PMCID: PMC4053841  PMID: 24576357
7.  Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor) 
Genome Biology  2011;12(11):R114.
Sorghum (Sorghum bicolor) is globally produced as a source of food, feed, fiber and fuel. Grain and sweet sorghums differ in a number of important traits, including stem sugar and juice accumulation, plant height as well as grain and biomass production. The first whole genome sequence of a grain sorghum is available, but additional genome sequences are required to study genome-wide and intraspecific variation for dissecting the genetic basis of these important traits and for tailor-designed breeding of this important C4 crop.
We resequenced two sweet and one grain sorghum inbred lines, and identified a set of nearly 1,500 genes differentiating sweet and grain sorghum. These genes fall into ten major metabolic pathways involved in sugar and starch metabolisms, lignin and coumarin biosynthesis, nucleic acid metabolism, stress responses and DNA damage repair. In addition, we uncovered 1,057,018 SNPs, 99,948 indels of 1 to 10 bp in length and 16,487 presence/absence variations as well as 17,111 copy number variations. The majority of the large-effect SNPs, indels and presence/absence variations resided in the genes containing leucine rich repeats, PPR repeats and disease resistance R genes possessing diverse biological functions or under diversifying selection, but were absent in genes that are essential for life.
This is a first report of the identification of genome-wide patterns of genetic variation in sorghum. High-density SNP and indel markers reported here will be a valuable resource for future gene-phenotype studies and the molecular breeding of this important crop and related species.
PMCID: PMC3334600  PMID: 22104744
8.  Cistrome: an integrative platform for transcriptional regulation studies 
Genome Biology  2011;12(8):R83.
The increasing volume of ChIP-chip and ChIP-seq data being generated creates a challenge for standard, integrative and reproducible bioinformatics data analysis platforms. We developed a web-based application called Cistrome, based on the Galaxy open source framework. In addition to the standard Galaxy functions, Cistrome has 29 ChIP-chip- and ChIP-seq-specific tools in three major categories, from preliminary peak calling and correlation analyses to downstream genome feature association, gene expression analyses, and motif discovery. Cistrome is available at
PMCID: PMC3245621  PMID: 21859476
9.  Model-based Analysis of ChIP-Seq (MACS) 
Genome Biology  2008;9(9):R137.
MACS performs model-based analysis of ChIP-Seq data generated by short read sequencers.
We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.
PMCID: PMC2592715  PMID: 18798982
10.  Overview of BioCreative II gene normalization 
Genome Biology  2008;9(Suppl 2):S3.
The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%.
Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers.
Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
PMCID: PMC2559987  PMID: 18834494
11.  Overview of BioCreative II gene mention recognition 
Genome Biology  2008;9(Suppl 2):S2.
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
PMCID: PMC2559986  PMID: 18834493
12.  Regulatory module network of basic/helix-loop-helix transcription factors in mouse brain 
Genome Biology  2007;8(11):R244.
A comprehensive regulatory module network of 15 bHLH transcription factors over 150 target genes in mouse brain has been constructed.
The basic/helix-loop-helix (bHLH) proteins are important components of the transcriptional regulatory network, controlling a variety of biological processes, especially the development of the central nervous system. Until now, reports describing the regulatory network of the bHLH transcription factor (TF) family have been scarce. In order to understand the regulatory mechanisms of bHLH TFs in mouse brain, we inferred their regulatory network from genome-wide gene expression profiles with the module networks method.
A regulatory network comprising 15 important bHLH TFs and 153 target genes was constructed. The network was divided into 28 modules based on expression profiles. A regulatory-motif search shows the complexity and diversity of the network. In addition, 26 cooperative bHLH TF pairs were also detected in the network. This cooperation suggests possible physical interactions or genetic regulation between TFs. Interestingly, some TFs in the network regulate more than one module. A novel cross-repression between Neurod6 and Hey2 was identified, which may control various functions in different brain regions. The presence of TF binding sites (TFBSs) in the promoter regions of their target genes validates more than 70% of TF-target gene pairs of the network. Literature mining provides additional support for five modules. More importantly, the regulatory relationships among selected key components are all validated in mutant mice.
Our network is reliable and very informative for understanding the role of bHLH TFs in mouse brain development and function. It provides a framework for future experimental analyses.
PMCID: PMC2258200  PMID: 18021424
13.  Model-based analysis of two-color arrays (MA2C) 
Genome Biology  2007;8(8):R178.
A normalization method based on probe GC content for two-color tiling arrays and an algorithm for detecting peak regions are presented. They are available in a stand-alone Java program.
A novel normalization method based on the GC content of probes is developed for two-color tiling arrays. The proposed method, together with robust estimates of the model parameters, is shown to perform superbly on published data sets. A robust algorithm for detecting peak regions is also formulated and shown to perform well compared to other approaches. The tools have been implemented as a stand-alone Java program called MA2C, which can display various plots of statistical analysis for quality control.
PMCID: PMC2375008  PMID: 17727723
14.  Causes and consequences of crossing-over evidenced via a high-resolution recombinational landscape of the honey bee 
Genome Biology  2015;16(1):15.
Social hymenoptera, the honey bee (Apis mellifera) in particular, have ultra-high crossover rates and a large degree of intra-genomic variation in crossover rates. Aligned with haploid genomics of males, this makes them a potential model for examining the causes and consequences of crossing over. To address why social insects have such high crossing-over rates and the consequences of this, we constructed a high-resolution recombination atlas by sequencing 55 individuals from three colonies with an average marker density of 314 bp/marker.
We find crossing over to be especially high in proximity to genes upregulated in worker brains, but see no evidence for a coupling with immune-related functioning. We detect only a low rate of non-crossover gene conversion, contrary to current evidence. This is in striking contrast to the ultrahigh crossing-over rate, almost double that previously estimated from lower resolution data. We robustly recover the predicted intragenomic correlations between crossing over and both population level diversity and GC content, which could be best explained as indirect and direct consequences of crossing over, respectively.
Our data are consistent with the view that diversification of worker behavior, but not immune function, is a driver of the high crossing-over rate in bees. While we see both high diversity and high GC content associated with high crossing-over rates, our estimate of the low non-crossover rate demonstrates that high non-crossover rates are not a necessary consequence of high recombination rates.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0566-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4305242  PMID: 25651211
15.  Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment 
Genome Biology  2014;15(10):466.
Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens.
We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation.
This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0466-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4195910  PMID: 25315136
16.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer 
Genome Biology  2014;15(10):480.
Identification of noncoding drivers from thousands of somatic alterations in a typical tumor is a difficult and unsolved problem. We report a computational framework, FunSeq2, to annotate and prioritize these mutations. The framework combines an adjustable data context integrating large-scale genomics and cancer resources with a streamlined variant-prioritization pipeline. The pipeline has a weighted scoring system combining: inter- and intra-species conservation; loss- and gain-of-function events for transcription-factor binding; enhancer-gene linkages and network centrality; and per-element recurrence across samples. We further highlight putative drivers with information specific to a particular sample, such as differential expression. FunSeq2 is available from
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0480-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4203974  PMID: 25273974
17.  Polygenic in vivo validation of cancer mutations using transposons 
Genome Biology  2014;15(9):455.
The in vivo validation of cancer mutations and genes identified in cancer genomics is resource-intensive because of the low throughput of animal experiments. We describe a mouse model that allows multiple cancer mutations to be validated in each animal line. Animal lines are generated with multiple candidate cancer mutations using transposons. The candidate cancer genes are tagged and randomly expressed in somatic cells, allowing easy identification of the cancer genes involved in the generated tumours. This system presents a useful, generalised and efficient means for animal validation of cancer genes.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0455-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4210617  PMID: 25260652
18.  Diverse modes of genomic alteration in hepatocellular carcinoma 
Genome Biology  2014;15(8):436.
Hepatocellular carcinoma (HCC) is a heterogeneous disease with high mortality rate. Recent genomic studies have identified TP53, AXIN1, and CTNNB1 as the most frequently mutated genes. Lower frequency mutations have been reported in ARID1A, ARID2 and JAK1. In addition, hepatitis B virus (HBV) integrations into the human genome have been associated with HCC.
Here, we deep-sequence 42 HCC patients with a combination of whole genome, exome and transcriptome sequencing to identify the mutational landscape of HCC using a reasonably large discovery cohort. We find frequent mutations in TP53, CTNNB1 and AXIN1, and rare but likely functional mutations in BAP1 and IDH1. Besides frequent hepatitis B virus integrations at TERT, we identify translocations at the boundaries of TERT. A novel deletion is identified in CTNNB1 in a region that is heavily mutated in multiple cancers. We also find multiple high-allelic frequency mutations in the extracellular matrix protein LAMA2. Lower expression levels of LAMA2 correlate with a proliferative signature, and predict poor survival and higher chance of cancer recurrence in HCC patients, suggesting an important role of the extracellular matrix and cell adhesion in tumor progression of a subgroup of HCC patients.
The heterogeneous disease of HCC features diverse modes of genomic alteration. In addition to common point mutations, structural variations and methylation changes, there are several virus-associated changes, including gene disruption or activation, formation of chimeric viral-human transcripts, and DNA copy number changes. Such a multitude of genomic events likely contributes to the heterogeneous nature of HCC.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0436-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4189592  PMID: 25159915
19.  MethylPurify: tumor purity deconvolution and differential methylation detection from single tumor DNA methylomes 
Genome Biology  2014;15(7):419.
We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0419-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4165374  PMID: 25103624
20.  MBASED: allele-specific expression detection in cancer tissues and cell lines 
Genome Biology  2014;15(8):405.
Allele-specific gene expression, ASE, is an important aspect of gene regulation. We developed a novel method MBASED, meta-analysis based allele-specific expression detection for ASE detection using RNA-seq data that aggregates information across multiple single nucleotide variation loci to obtain a gene-level measure of ASE, even when prior phasing information is unavailable. MBASED is capable of one-sample and two-sample analyses and performs well in simulations. We applied MBASED to a panel of cancer cell lines and paired tumor-normal tissue samples, and observed extensive ASE in cancer, but not normal, samples, mainly driven by genomic copy number alterations.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0405-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4165366  PMID: 25315065
21.  Comparative population genomics reveals the domestication history of the peach, Prunus persica, and human influences on perennial fruit crops 
Genome Biology  2014;15(7):415.
Recently, many studies utilizing next generation sequencing have investigated plant evolution and domestication in annual crops. Peach, Prunus persica, is a typical perennial fruit crop that has ornamental and edible varieties. Unlike other fruit crops, cultivated peach includes a large number of phenotypes but few polymorphisms. In this study, we explore the genetic basis of domestication in peach and the influence of humans on its evolution.
We perform large-scale resequencing of 10 wild and 74 cultivated peach varieties, including 9 ornamental, 23 breeding, and 42 landrace lines. We identify 4.6 million SNPs, a large number of which could explain the phenotypic variation in cultivated peach. Population analysis shows a single domestication event, the speciation of P. persica from wild peach. Ornamental and edible peach both belong to P. persica, along with another geographically separated subgroup, Prunus ferganensis.
We identify 147 and 262 genes under edible and ornamental selection, respectively. Some of these genes are associated with important biological features. We perform a population heterozygosity analysis in different plants that indicates that free recombination effects could affect domestication history. By applying artificial selection during the domestication of the peach and facilitating its asexual propagation, humans have caused a sharp decline of the heterozygote ratio of SNPs.
Our analyses enhance our knowledge of the domestication history of perennial fruit crops, and the dataset we generated could be useful for future research on comparative population genomics.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0415-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4174323  PMID: 25079967
22.  Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation 
Genome Biology  2014;15(7):408.
DNA methylation is an important type of epigenetic modification involved in gene regulation. Although strong DNA methylation at promoters is widely recognized to be associated with transcriptional repression, many aspects of DNA methylation remain not fully understood, including the quantitative relationships between DNA methylation and expression levels, and the individual roles of promoter and gene body methylation.
Here we present an integrated analysis of whole-genome bisulfite sequencing and RNA sequencing data from human samples and cell lines. We find that while promoter methylation inversely correlates with gene expression as generally observed, the repressive effect is clear only on genes with a very high DNA methylation level. By means of statistical modeling, we find that DNA methylation is indicative of the expression class of a gene in general, but gene body methylation is a better indicator than promoter methylation. These findings are general in that a model constructed from a sample or cell line could accurately fit the unseen data from another. We further find that promoter and gene body methylation have minimal redundancy, and either one is sufficient to signify low expression. Finally, we obtain increased modeling power by integrating histone modification data with the DNA methylation data, showing that neither type of information fully subsumes the other.
Our results suggest that DNA methylation outside promoters also plays critical roles in gene regulation. Future studies on gene regulatory mechanisms and disease-associated differential methylation should pay more attention to DNA methylation at gene bodies and other non-promoter regions.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0408-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4189148  PMID: 25074712
23.  Distinct and overlapping control of 5-methylcytosine and 5-hydroxymethylcytosine by the TET proteins in human cancer cells 
Genome Biology  2014;15(6):R81.
The TET family of dioxygenases catalyze conversion of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), but their involvement in establishing normal 5mC patterns during mammalian development and their contributions to aberrant control of 5mC during cellular transformation remain largely unknown. We depleted TET1, TET2, and TET3 in a pluripotent embryonic carcinoma cell model and examined the impact on genome-wide 5mC, 5hmC, and transcriptional patterns.
TET1 depletion yields widespread reduction of 5hmC, while depletion of TET2 and TET3 reduces 5hmC at a subset of TET1 targets suggesting functional co-dependence. TET2 or TET3 depletion also causes increased 5hmC, suggesting these proteins play a major role in 5hmC removal. All TETs prevent hypermethylation throughout the genome, a finding dramatically illustrated in CpG island shores, where TET depletion results in prolific hypermethylation. Surprisingly, TETs also promote methylation, as hypomethylation was associated with 5hmC reduction. TET function is highly specific to chromatin environment: 5hmC maintenance by all TETs occurs at polycomb-marked chromatin and genes expressed at moderate levels; 5hmC removal by TET2 is associated with highly transcribed genes enriched for H3K4me3 and H3K36me3. Importantly, genes prone to hypermethylation in cancer become depleted of 5hmC with TET deficiency, suggesting that TETs normally promote 5hmC at these loci. Finally, all three TETs, but especially TET2, are required for 5hmC enrichment at enhancers, a condition necessary for expression of adjacent genes.
These results provide novel insight into the division of labor among TET proteins and reveal important connections between TET activity, the chromatin landscape, and gene expression.
PMCID: PMC4197818  PMID: 24958354
24.  Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens 
Genome Biology  2014;15(4):R65.
Increasing evidence supports a role for altered gene expression in mediating the lasting effects of cocaine on the brain, and recent work has demonstrated the involvement of chromatin modifications in these alterations. However, all such studies to date have been restricted by their reliance on microarray technologies that have intrinsic limitations.
We use next generation sequencing methods, RNA-seq and ChIP-seq for RNA polymerase II and several histone methylation marks, to obtain a more complete view of cocaine-induced changes in gene expression and associated adaptations in numerous modes of chromatin regulation in the mouse nucleus accumbens, a key brain reward region. We demonstrate an unexpectedly large number of pre-mRNA splicing alterations in response to repeated cocaine treatment. In addition, we identify combinations of chromatin changes, or signatures, that correlate with cocaine-dependent regulation of gene expression, including those involving pre-mRNA alternative splicing. Through bioinformatic prediction and biological validation, we identify one particular splicing factor, A2BP1(Rbfox1/Fox-1), which is enriched at genes that display certain chromatin signatures and contributes to drug-induced behavioral abnormalities. Together, this delineation of the cocaine-induced epigenome in the nucleus accumbens reveals several novel modes of regulation by which cocaine alters the brain.
We establish combinatorial chromatin and transcriptional profiles in mouse nucleus accumbens after repeated cocaine treatment. These results serve as an important resource for the field and provide a template for the analysis of other systems to reveal new transcriptional and epigenetic mechanisms of neuronal regulation.
PMCID: PMC4073058  PMID: 24758366
25.  Hypermethylation in the ZBTB20 gene is associated with major depressive disorder 
Genome Biology  2014;15(4):R56.
Although genetic variation is believed to contribute to an individual’s susceptibility to major depressive disorder, genome-wide association studies have not yet identified associations that could explain the full etiology of the disease. Epigenetics is increasingly believed to play a major role in the development of common clinical phenotypes, including major depressive disorder.
Genome-wide MeDIP-Sequencing was carried out on a total of 50 monozygotic twin pairs from the UK and Australia that are discordant for depression. We show that major depressive disorder is associated with significant hypermethylation within the coding region of ZBTB20, and is replicated in an independent cohort of 356 unrelated case-control individuals. The twins with major depressive disorder also show increased global variation in methylation in comparison with their unaffected co-twins. ZBTB20 plays an essential role in the specification of the Cornu Ammonis-1 field identity in the developing hippocampus, a region previously implicated in the development of major depressive disorder.
Our results suggest that aberrant methylation profiles affecting the hippocampus are associated with major depressive disorder and show the potential of the epigenetic twin model in neuro-psychiatric disease.
PMCID: PMC4072999  PMID: 24694013

Results 1-25 (95)