Search tips
Search criteria

Results 1-25 (80)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Epigenome data release: a participant-centered approach to privacy protection 
Genome Biology  2015;16(1):142.
Large-scale epigenome mapping by the NIH Roadmap Epigenomics Project, the ENCODE Consortium and the International Human Epigenome Consortium (IHEC) produces genome-wide DNA methylation data at one base-pair resolution. We examine how such data can be made open-access while balancing appropriate interpretation and genomic privacy. We propose guidelines for data release that both reduce ambiguity in the interpretation of open-access data and limit immediate access to genetic variation data that are made available through controlled access.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0723-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4504083  PMID: 26185018
2.  A Comparative Encyclopedia of DNA Elements in the Mouse Genome 
Yue, Feng | Cheng, Yong | Breschi, Alessandra | Vierstra, Jeff | Wu, Weisheng | Ryba, Tyrone | Sandstrom, Richard | Ma, Zhihai | Davis, Carrie | Pope, Benjamin D. | Shen, Yin | Pervouchine, Dmitri D. | Djebali, Sarah | Thurman, Bob | Kaul, Rajinder | Rynes, Eric | Kirilusha, Anthony | Marinov, Georgi K. | Williams, Brian A. | Trout, Diane | Amrhein, Henry | Fisher-Aylor, Katherine | Antoshechkin, Igor | DeSalvo, Gilberto | See, Lei-Hoon | Fastuca, Meagan | Drenkow, Jorg | Zaleski, Chris | Dobin, Alex | Prieto, Pablo | Lagarde, Julien | Bussotti, Giovanni | Tanzer, Andrea | Denas, Olgert | Li, Kanwei | Bender, M. A. | Zhang, Miaohua | Byron, Rachel | Groudine, Mark T. | McCleary, David | Pham, Long | Ye, Zhen | Kuan, Samantha | Edsall, Lee | Wu, Yi-Chieh | Rasmussen, Matthew D. | Bansal, Mukul S. | Keller, Cheryl A. | Morrissey, Christapher S. | Mishra, Tejaswini | Jain, Deepti | Dogan, Nergiz | Harris, Robert S. | Cayting, Philip | Kawli, Trupti | Boyle, Alan P. | Euskirchen, Ghia | Kundaje, Anshul | Lin, Shin | Lin, Yiing | Jansen, Camden | Malladi, Venkat S. | Cline, Melissa S. | Erickson, Drew T. | Kirkup, Vanessa M | Learned, Katrina | Sloan, Cricket A. | Rosenbloom, Kate R. | de Sousa, Beatriz Lacerda | Beal, Kathryn | Pignatelli, Miguel | Flicek, Paul | Lian, Jin | Kahveci, Tamer | Lee, Dongwon | Kent, W. James | Santos, Miguel Ramalho | Herrero, Javier | Notredame, Cedric | Johnson, Audra | Vong, Shinny | Lee, Kristen | Bates, Daniel | Neri, Fidencio | Diegel, Morgan | Canfield, Theresa | Sabo, Peter J. | Wilken, Matthew S. | Reh, Thomas A. | Giste, Erika | Shafer, Anthony | Kutyavin, Tanya | Haugen, Eric | Dunn, Douglas | Reynolds, Alex P. | Neph, Shane | Humbert, Richard | Hansen, R. Scott | De Bruijn, Marella | Selleri, Licia | Rudensky, Alexander | Josefowicz, Steven | Samstein, Robert | Eichler, Evan E. | Orkin, Stuart H. | Levasseur, Dana | Papayannopoulou, Thalia | Chang, Kai-Hsin | Skoultchi, Arthur | Gosh, Srikanta | Disteche, Christine | Treuting, Piper | Wang, Yanli | Weiss, Mitchell J. | Blobel, Gerd A. | Good, Peter J. | Lowdon, Rebecca F. | Adams, Leslie B. | Zhou, Xiao-Qiao | Pazin, Michael J. | Feingold, Elise A. | Wold, Barbara | Taylor, James | Kellis, Manolis | Mortazavi, Ali | Weissman, Sherman M. | Stamatoyannopoulos, John | Snyder, Michael P. | Guigo, Roderic | Gingeras, Thomas R. | Gilbert, David M. | Hardison, Ross C. | Beer, Michael A. | Ren, Bing
Nature  2014;515(7527):355-364.
As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
PMCID: PMC4266106  PMID: 25409824
3.  Characterizing neutral genomic diversity and selection signatures in indigenous populations of Moroccan goats (Capra hircus) using WGS data 
Frontiers in Genetics  2015;6:107.
Since the time of their domestication, goats (Capra hircus) have evolved in a large variety of locally adapted populations in response to different human and environmental pressures. In the present era, many indigenous populations are threatened with extinction due to their substitution by cosmopolitan breeds, while they might represent highly valuable genomic resources. It is thus crucial to characterize the neutral and adaptive genetic diversity of indigenous populations. A fine characterization of whole genome variation in farm animals is now possible by using new sequencing technologies. We sequenced the complete genome at 12× coverage of 44 goats geographically representative of the three phenotypically distinct indigenous populations in Morocco. The study of mitochondrial genomes showed a high diversity exclusively restricted to the haplogroup A. The 44 nuclear genomes showed a very high diversity (24 million variants) associated with low linkage disequilibrium. The overall genetic diversity was weakly structured according to geography and phenotypes. When looking for signals of positive selection in each population we identified many candidate genes, several of which gave insights into the metabolic pathways or biological processes involved in the adaptation to local conditions (e.g., panting in warm/desert conditions). This study highlights the interest of WGS data to characterize livestock genomic diversity. It illustrates the valuable genetic richness present in indigenous populations that have to be sustainably managed and may represent valuable genetic resources for the long-term preservation of the species.
PMCID: PMC4387958  PMID: 25904931
Capra hircus; WGS; genomic diversity; population genomics; selection signatures; indigenous populations; Morocco
4.  Transcriptional diversity during lineage commitment of human blood progenitors 
Science (New York, N.Y.)  2014;345(6204):1251033.
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identify extensive cell-type specific expression changes: 6,711 genes and 10,724 transcripts, enriched in non-protein coding elements at early stages of differentiation. In addition, we discovered 7,881 novel splice junctions and 2,301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrate experimentally cell specific isoform usage, identifying NFIB as a regulator of megakaryocyte maturation – the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.
PMCID: PMC4254742  PMID: 25258084
5.  The Ensembl Regulatory Build 
Genome Biology  2015;16(1):56.
Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub.
PMCID: PMC4407537  PMID: 25887522
6.  Gibbon genome and the fast karyotype evolution of small apes 
Carbone, Lucia | Harris, R. Alan | Gnerre, Sante | Veeramah, Krishna R. | Lorente-Galdos, Belen | Huddleston, John | Meyer, Thomas J. | Herrero, Javier | Roos, Christian | Aken, Bronwen | Anaclerio, Fabio | Archidiacono, Nicoletta | Baker, Carl | Barrell, Daniel | Batzer, Mark A. | Beal, Kathryn | Blancher, Antoine | Bohrson, Craig L. | Brameier, Markus | Campbell, Michael S. | Capozzi, Oronzo | Casola, Claudio | Chiatante, Giorgia | Cree, Andrew | Damert, Annette | de Jong, Pieter J. | Dumas, Laura | Fernandez-Callejo, Marcos | Flicek, Paul | Fuchs, Nina V. | Gut, Marta | Gut, Ivo | Hahn, Matthew W. | Hernandez-Rodriguez, Jessica | Hillier, LaDeana W. | Hubley, Robert | Ianc, Bianca | Izsvák, Zsuzsanna | Jablonski, Nina G. | Johnstone, Laurel M. | Karimpour-Fard, Anis | Konkel, Miriam K. | Kostka, Dennis | Lazar, Nathan H. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yue | Locke, Devin P. | Mallick, Swapan | Mendez, Fernando L. | Muffato, Matthieu | Nazareth, Lynne V. | Nevonen, Kimberly A. | O,Bleness, Majesta | Ochis, Cornelia | Odom, Duncan T. | Pollard, Katherine S. | Quilez, Javier | Reich, David | Rocchi, Mariano | Schumann, Gerald G. | Searle, Stephen | Sikela, James M. | Skollar, Gabriella | Smit, Arian | Sonmez, Kemal | Hallers, Boudewijn ten | Terhune, Elizabeth | Thomas, Gregg W.C. | Ullmer, Brygg | Ventura, Mario | Walker, Jerilyn A. | Wall, Jeffrey D. | Walter, Lutz | Ward, Michelle C. | Wheelan, Sarah J. | Whelan, Christopher W. | White, Simon | Wilhelm, Larry J. | Woerner, August E. | Yandell, Mark | Zhu, Baoli | Hammer, Michael F. | Marques-Bonet, Tomas | Eichler, Evan E. | Fulton, Lucinda | Fronick, Catrina | Muzny, Donna M. | Warren, Wesley C. | Worley, Kim C. | Rogers, Jeffrey | Wilson, Richard K. | Gibbs, Richard A.
Nature  2014;513(7517):195-201.
Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ~5 million years ago, coincident with major geographical changes in Southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
PMCID: PMC4249732  PMID: 25209798
7.  Random Monoallelic Gene Expression Increases upon Embryonic Stem Cell Differentiation 
Developmental cell  2014;28(4):351-365.
Random autosomal monoallelic gene expression refers to the transcription of a gene from one of two homologous alleles. We assessed the dynamics of monoallelic expression during development through an allele-specific RNA sequencing screen in clonal populations of hybrid mouse embryonic stem cells (ESCs) and neural progenitor cells (NPCs). We identified 67 and 376 inheritable autosomal random monoallelically expressed genes in ESCs and NPCs respectively, a 5.6-fold increase upon differentiation. While DNA methylation and nuclear positioning did not distinguish the active and inactive alleles, specific histone modifications were differentially enriched between the two alleles. Interestingly, expression levels of 8% of the monoallelically expressed genes remained similar between monoallelic and biallelic clones. These results support a model in which random monoallelic expression occurs stochastically during differentiation, and for some genes is compensated for by the cell to maintain the required transcriptional output of these genes.
PMCID: PMC3955261  PMID: 24576421
8.  The Common Marmoset Genome Provides Insight into Primate Biology and Evolution 
Worley, Kim C. | Warren, Wesley C. | Rogers, Jeffrey | Locke, Devin | Muzny, Donna M. | Mardis, Elaine R. | Weinstock, George M. | Tardif, Suzette D. | Aagaard, Kjersti M. | Archidiacono, Nicoletta | Rayan, Nirmala Arul | Batzer, Mark A. | Beal, Kathryn | Brejova, Brona | Capozzi, Oronzo | Capuano, Saverio B. | Casola, Claudio | Chandrabose, Mimi M. | Cree, Andrew | Dao, Marvin Diep | de Jong, Pieter J. | del Rosario, Ricardo Cruz-Herrera | Delehaunty, Kim D. | Dinh, Huyen H. | Eichler, Evan | Fitzgerald, Stephen | Flicek, Paul | Fontenot, Catherine C. | Fowler, R. Gerald | Fronick, Catrina | Fulton, Lucinda A. | Fulton, Robert S. | Gabisi, Ramatu Ayiesha | Gerlach, Daniel | Graves, Tina A. | Gunaratne, Preethi H. | Hahn, Matthew W. | Haig, David | Han, Yi | Harris, R. Alan | Herrero, Javier M. | Hillier, LaDeana W. | Hubley, Robert | Hughes, Jennifer F. | Hume, Jennifer | Jhangiani, Shalini N. | Jorde, Lynn B. | Joshi, Vandita | Karakor, Emre | Konkel, Miriam K. | Kosiol, Carolin | Kovar, Christie L. | Kriventseva, Evgenia V. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yih-shin | Lopez, John | Lopez-Otin, Carlos | Lorente-Galdos, Belen | Mansfield, Keith G. | Marques-Bonet, Tomas | Minx, Patrick | Misceo, Doriana | Moncrieff, J. Scott | Morgan, Margaret B. | Muthuswamy, Raveendran | Nazareth, Lynne V. | Newsham, Irene | Nguyen, Ngoc Bich | Okwuonu, Geoffrey O. | Prabhakar, Shyam | Perales, Lora | Pu, Ling-Ling | Puente, Xose S. | Quesada, Victor | Ranck, Megan C. | Raney, Brian J. | Deiros, David Rio | Rocchi, Mariano | Rodriguez, David | Ross, Corinna | Ruffier, Magali | Ruiz, San Juana | Sajjadian, S. | Santibanez, Jireh | Schrider, Daniel R. | Searle, Steve | Skaletsky, Helen | Soibam, Benjamin | Smit, Arian F. A. | Tennakoon, Jayantha B. | Tomaska, Lubomir | Ullmer, Brygg | Vejnar, Charles E. | Ventura, Mario | Vilella, Albert J. | Vinar, Tomas | Vogel, Jan-Hinnerk | Walker, Jerilyn A. | Wang, Qing | Warner, Crystal M. | Wildman, Derek E. | Witherspoon, David J. | Wright, Rita A. | Wu, Yuanqing | Xiao, Weimin | Xing, Jinchuan | Zdobnov, Evgeny M. | Zhu, Baoli | Gibbs, Richard A. | Wilson, Richard K.
Nature genetics  2014;46(8):850-857.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
PMCID: PMC4138798  PMID: 25038751
9.  Avianbase: a community resource for bird genomics 
Genome Biology  2015;16(1):21.
Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium.
PMCID: PMC4310197  PMID: 25723810
10.  Enhancer Evolution across 20 Mammalian Species 
Cell  2015;160(3):554-566.
The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution.
Graphical Abstract
•Rapid enhancer and slow promoter evolution across genomes of 20 mammalian species•Enhancers are rarely conserved across these mammals•Recently evolved enhancers dominate mammalian regulatory landscapes•Unbiased mapping links candidate enhancers with lineage-specific positive selection
Comparative functional genomic analysis in 20 mammalian species reveals distinct features for the evolution of enhancers, in comparison to those of promoters, across 180 million years.
PMCID: PMC4313353  PMID: 25635462
11.  Extending reference assembly models 
Genome Biology  2015;16(1):13.
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
PMCID: PMC4305238  PMID: 25651527
12.  The Sheep Genome Illuminates Biology of the Rumen and Lipid Metabolism 
Science (New York, N.Y.)  2014;344(6188):1168-1173.
Sheep (Ovis aries) are a major source of meat, milk and fiber in the form of wool, and represent a distinct class of animals that have a specialized digestive organ, the rumen, which carries out the initial digestion of plant material. We have developed and analyzed a high quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants, compared to non-ruminant animals.
PMCID: PMC4157056  PMID: 24904168
13.  The IPD and IMGT/HLA database: allele variant databases 
Nucleic Acids Research  2014;43(Database issue):D423-D431.
The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities.
PMCID: PMC4383959  PMID: 25414341
14.  Ensembl 2015 
Nucleic Acids Research  2014;43(Database issue):D662-D669.
Ensembl ( is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site ( Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (, which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page ( under an Apache 2.0 open source license.
PMCID: PMC4383879  PMID: 25352552
15.  Computational approaches to interpreting genomic sequence variation 
Genome Medicine  2014;6(10):87.
Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.
PMCID: PMC4254438  PMID: 25473426
16.  Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways 
eLife  null;3:e02626.
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.
eLife digest
Stretches of DNA called cis-regulatory modules (or CRMs for short) could help researchers to identify the regions of DNA that are most important for controlling genes. CRMs are regions where multiple transcription factors—proteins that control when and how genes are expressed—bind to DNA. As important biological pathways are often regulated by more than one transcription factor, CRMs are therefore a good target when looking for DNA regions that, if mutated, are likely to cause disease.
If a stretch of DNA performs an important role, it is often conserved throughout evolution. This is often observed for genes that make proteins. Indeed, DNA regions that specify critical amino acids that make up proteins are often conserved across distantly related species. However, unlike the changes made to the amino acid encoding parts of genes, it is currently a challenge to predict which changes in the rest of the genome will affect gene expression.
One reason for this challenge is that transcription factor binding sites are rapidly evolving. This rapid evolution means that strictly comparing DNA sequences between species may fail to identify where transcription factors like to bind in the genome. Numerous experimental efforts have therefore been made to map these sites. These have revealed that there are a huge number of regions in the human genome that can bind transcription factors: hundreds of thousands of sites, far more than there are genes. For this reason, there is a great interest in revealing which of these regulatory regions are critical for maintaining normal levels and timings of gene expression.
Ballester et al. compared the binding sites of four transcription factors responsible for regulating liver function in humans, macaques, mice, rats, and dogs. About two-thirds of these binding sites were found in CRMs. Less than half of the CRMs in humans were also CRMs in another species—but Ballester et al. found that these shared CRMs are predominantly in charge of regulating the essential biological pathways that allow the liver to function correctly. In addition, Ballester et al. identified several examples of disease-causing DNA mutations in shared CRMs that affected the expression of genes that make up pathways such as the blood clotting cascade. Genome-wide association studies also uncovered common variants for liver-related traits that were enriched for the CRMs found in more than one species, further supporting their importance.
As transcription factors work in different ways in different tissues, further studies are now required to expand these observations to organs other than the liver. Future work is also needed to investigate the function of thousands of conserved CRMs whose role in liver gene regulation remains unknown.
PMCID: PMC4359374  PMID: 25279814
cis regulatory module; transcription factors; molecular evolution; macaque; dog; liver; human; mouse; rat; other
17.  Global identification of Smad2 and Eomesodermin targets in zebrafish identifies a conserved transcriptional network in mesendoderm and a novel role for Eomesodermin in repression of ectodermal gene expression 
BMC Biology  2014;12:81.
Nodal signalling is an absolute requirement for normal mesoderm and endoderm formation in vertebrate embryos, yet the transcriptional networks acting directly downstream of Nodal and the extent to which they are conserved is largely unexplored, particularly in vivo. Eomesodermin also plays a role in patterning mesoderm and endoderm in vertebrates, but its mechanisms of action and how it interacts with the Nodal signalling pathway are still unclear.
Using a combination of expression analysis and chromatin immunoprecipitation with deep sequencing (ChIP-seq) we identify direct targets of Smad2, the effector of Nodal signalling in blastula stage zebrafish embryos, including many novel target genes. Through comparison of these data with published ChIP-seq data in human, mouse and Xenopus we show that the transcriptional network driven by Smad2 in mesoderm and endoderm is conserved in these vertebrate species. We also show that Smad2 and zebrafish Eomesodermin a (Eomesa) bind common genomic regions proximal to genes involved in mesoderm and endoderm formation, suggesting Eomesa forms a general component of the Smad2 signalling complex in zebrafish. Combinatorial perturbation of Eomesa and Smad2-interacting factor Foxh1 results in loss of both mesoderm and endoderm markers, confirming the role of Eomesa in endoderm formation and its functional interaction with Foxh1 for correct Nodal signalling. Finally, we uncover a novel role for Eomesa in repressing ectodermal genes in the early blastula.
Our data demonstrate that evolutionarily conserved developmental functions of Nodal signalling occur through maintenance of the transcriptional network directed by Smad2. This network is modulated by Eomesa in zebrafish which acts to promote mesoderm and endoderm formation in combination with Nodal signalling, whilst Eomesa also opposes ectoderm gene expression. Eomesa, therefore, regulates the formation of all three germ layers in the early zebrafish embryo.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-014-0081-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4206766  PMID: 25277163
Nodal; Smad2; Eomesodermin; Foxh1; Neural; Transcriptional regulation
18.  Dynamics, mechanisms, and functional implications of transcription factor binding evolution in metazoans 
Nature reviews. Genetics  2014;15(4):221-233.
Transcription factor binding differences can contribute to organismal evolution by altering downstream gene expression programmes. Recent genome-wide studies in Drosophila and mammals have revealed common quantitative and combinatorial properties of in vivo DNA-binding, as well as significant differences in the rate and mechanisms of metazoan transcription factor binding evolution. Here, we review the recently-discovered, rapid re-wiring of in vivo transcription factor binding between related metazoan species and summarize general principles underlying the observed patterns of evolution. We then consider what might explain genome evolution differences between metazoan phyla, and outline the conceptual and technological challenges facing the field.
PMCID: PMC4175440  PMID: 24590227
19.  The Ensembl REST API: Ensembl Data for Any Language 
Bioinformatics  2014;31(1):143-145.
Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.
Availability and implementation: The Ensembl REST API can be accessed at and source code is freely available under an Apache 2.0 license from
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4271150  PMID: 25236461
20.  Characterizing Genetic Variants for Clinical Action 
Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few common variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: 1) identify clinically valid genetic variants; 2) decide whether they are actionable and what the action should be; and 3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.
PMCID: PMC4158437  PMID: 24634402
genomic medicine; clinical actionability; database; electronic health records (EHR); pharmacogenomics; DNA sequencing
21.  Computational approaches to identify functional genetic variants in cancer genomes 
Nature methods  2013;10(8):723-729.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
PMCID: PMC3919555  PMID: 23900255
22.  Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans 
The International Mouse Phenotyping Consortium (IMPC) ( will reveal the pleiotropic functions of every gene in the mouse genome and uncover the wider role of genetic loci within diverse biological systems. Comprehensive informatics solutions are vital to ensuring that this vast array of data is captured in a standardised manner and made accessible to the scientific community for interrogation and analysis. Here we review the existing EuroPhenome and WTSI phenotype informatics systems and the IKMC portal, and present plans for extending these systems and lessons learned to the development of a robust IMPC informatics infrastructure.
PMCID: PMC4106044  PMID: 22991088
23.  The draft genomes of soft–shell turtle and green sea turtle yield insights into the development and evolution of the turtle–specific body plan 
Nature genetics  2013;45(6):701-706.
The unique anatomical features of turtles have raised unanswered questions about the origin of their unique body plan. We generated and analyzed draft genomes of the soft-shell turtle (Pelodiscus sinensis) and the green sea turtle (Chelonia mydas); our results indicated the close relationship of the turtles to the bird-crocodilian lineage, from which they split ~267.9–248.3 million years ago (Upper Permian to Triassic). We also found extensive expansion of olfactory receptor genes in these turtles. Embryonic gene expression analysis identified an hourglass-like divergence of turtle and chicken embryogenesis, with maximal conservation around the vertebrate phylotypic period, rather than at later stages that show the amniote-common pattern. Wnt5a expression was found in the growth zone of the dorsal shell, supporting the possible co-option of limb-associated Wnt signaling in the acquisition of this turtle-specific novelty. Our results suggest that turtle evolution was accompanied by an unexpectedly conservative vertebrate phylotypic period, followed by turtle-specific repatterning of development to yield the novel structure of the shell.
PMCID: PMC4000948  PMID: 23624526
24.  Transcriptome and genome sequencing uncovers functional variation in humans 
Nature  2013;501(7468):506-511.
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
PMCID: PMC3918453  PMID: 24037378
25.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics 
Science (New York, N.Y.)  2013;342(6154):1235587.
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
PMCID: PMC3947637  PMID: 24092746

Results 1-25 (80)