PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (31)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  Toward community standards in the quest for orthologs 
Bioinformatics  2012;28(6):900-904.
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
Contact: dessimoz@ebi.ac.uk
doi:10.1093/bioinformatics/bts050
PMCID: PMC3307119  PMID: 22332236
2.  Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution 
BMC Genomics  2015;16(1):87.
Background
Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood.
Results
We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA).
Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function.
Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions.
Conclusion
We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1245-6) contains supplementary material, which is available to authorized users.
doi:10.1186/s12864-015-1245-6
PMCID: PMC4333152
Mouse ENCODE; Regulatory sequences; Comparative genomics
3.  The Common Marmoset Genome Provides Insight into Primate Biology and Evolution 
Worley, Kim C. | Warren, Wesley C. | Rogers, Jeffrey | Locke, Devin | Muzny, Donna M. | Mardis, Elaine R. | Weinstock, George M. | Tardif, Suzette D. | Aagaard, Kjersti M. | Archidiacono, Nicoletta | Rayan, Nirmala Arul | Batzer, Mark A. | Beal, Kathryn | Brejova, Brona | Capozzi, Oronzo | Capuano, Saverio B. | Casola, Claudio | Chandrabose, Mimi M. | Cree, Andrew | Dao, Marvin Diep | de Jong, Pieter J. | del Rosario, Ricardo Cruz-Herrera | Delehaunty, Kim D. | Dinh, Huyen H. | Eichler, Evan | Fitzgerald, Stephen | Flicek, Paul | Fontenot, Catherine C. | Fowler, R. Gerald | Fronick, Catrina | Fulton, Lucinda A. | Fulton, Robert S. | Gabisi, Ramatu Ayiesha | Gerlach, Daniel | Graves, Tina A. | Gunaratne, Preethi H. | Hahn, Matthew W. | Haig, David | Han, Yi | Harris, R. Alan | Herrero, Javier M. | Hillier, LaDeana W. | Hubley, Robert | Hughes, Jennifer F. | Hume, Jennifer | Jhangiani, Shalini N. | Jorde, Lynn B. | Joshi, Vandita | Karakor, Emre | Konkel, Miriam K. | Kosiol, Carolin | Kovar, Christie L. | Kriventseva, Evgenia V. | Lee, Sandra L. | Lewis, Lora R. | Liu, Yih-shin | Lopez, John | Lopez-Otin, Carlos | Lorente-Galdos, Belen | Mansfield, Keith G. | Marques-Bonet, Tomas | Minx, Patrick | Misceo, Doriana | Moncrieff, J. Scott | Morgan, Margaret B. | Muthuswamy, Raveendran | Nazareth, Lynne V. | Newsham, Irene | Nguyen, Ngoc Bich | Okwuonu, Geoffrey O. | Prabhakar, Shyam | Perales, Lora | Pu, Ling-Ling | Puente, Xose S. | Quesada, Victor | Ranck, Megan C. | Raney, Brian J. | Deiros, David Rio | Rocchi, Mariano | Rodriguez, David | Ross, Corinna | Ruffier, Magali | Ruiz, San Juana | Sajjadian, S. | Santibanez, Jireh | Schrider, Daniel R. | Searle, Steve | Skaletsky, Helen | Soibam, Benjamin | Smit, Arian F. A. | Tennakoon, Jayantha B. | Tomaska, Lubomir | Ullmer, Brygg | Vejnar, Charles E. | Ventura, Mario | Vilella, Albert J. | Vinar, Tomas | Vogel, Jan-Hinnerk | Walker, Jerilyn A. | Wang, Qing | Warner, Crystal M. | Wildman, Derek E. | Witherspoon, David J. | Wright, Rita A. | Wu, Yuanqing | Xiao, Weimin | Xing, Jinchuan | Zdobnov, Evgeny M. | Zhu, Baoli | Gibbs, Richard A. | Wilson, Richard K.
Nature genetics  2014;46(8):850-857.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras.
We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
doi:10.1038/ng.3042
PMCID: PMC4138798  PMID: 25038751
4.  Extending reference assembly models 
Genome Biology  2015;16(1):13.
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
doi:10.1186/s13059-015-0587-3
PMCID: PMC4305238  PMID: 25651527
5.  The duck genome and transcriptome provide insight into an avian influenza virus reservoir species 
Nature genetics  2013;45(7):776-783.
The duck (Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We present the duck genome sequence and perform deep transcriptome analyses to investigate immune-related genes. Our data indicate that the duck possesses a contractive immune gene repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage-specific duplications. We identify genes that are responsive to influenza A viruses using the lung transcriptomes of control ducks and ones that were infected with either a highly pathogenic (A/duck/Hubei/49/05) or a weakly pathogenic (A/goose/Hubei/65/05) H5N1 virus. Further, we show how the duck’s defense mechanisms against influenza infection have been optimized through the diversification of its β-defensin and butyrophilin-like repertoires. These analyses, in combination with the genomic and transcriptomic data, provide a resource for characterizing the interaction between host and influenza viruses.
doi:10.1038/ng.2657
PMCID: PMC4003391  PMID: 23749191
6.  The draft genomes of soft–shell turtle and green sea turtle yield insights into the development and evolution of the turtle–specific body plan 
Nature genetics  2013;45(6):701-706.
The unique anatomical features of turtles have raised unanswered questions about the origin of their unique body plan. We generated and analyzed draft genomes of the soft-shell turtle (Pelodiscus sinensis) and the green sea turtle (Chelonia mydas); our results indicated the close relationship of the turtles to the bird-crocodilian lineage, from which they split ~267.9–248.3 million years ago (Upper Permian to Triassic). We also found extensive expansion of olfactory receptor genes in these turtles. Embryonic gene expression analysis identified an hourglass-like divergence of turtle and chicken embryogenesis, with maximal conservation around the vertebrate phylotypic period, rather than at later stages that show the amniote-common pattern. Wnt5a expression was found in the growth zone of the dorsal shell, supporting the possible co-option of limb-associated Wnt signaling in the acquisition of this turtle-specific novelty. Our results suggest that turtle evolution was accompanied by an unexpectedly conservative vertebrate phylotypic period, followed by turtle-specific repatterning of development to yield the novel structure of the shell.
doi:10.1038/ng.2615
PMCID: PMC4000948  PMID: 23624526
7.  The zebrafish reference genome sequence and its relationship to the human genome 
Howe, Kerstin | Clark, Matthew D. | Torroja, Carlos F. | Torrance, James | Berthelot, Camille | Muffato, Matthieu | Collins, John E. | Humphray, Sean | McLaren, Karen | Matthews, Lucy | McLaren, Stuart | Sealy, Ian | Caccamo, Mario | Churcher, Carol | Scott, Carol | Barrett, Jeffrey C. | Koch, Romke | Rauch, Gerd-Jörg | White, Simon | Chow, William | Kilian, Britt | Quintais, Leonor T. | Guerra-Assunção, José A. | Zhou, Yi | Gu, Yong | Yen, Jennifer | Vogel, Jan-Hinnerk | Eyre, Tina | Redmond, Seth | Banerjee, Ruby | Chi, Jianxiang | Fu, Beiyuan | Langley, Elizabeth | Maguire, Sean F. | Laird, Gavin K. | Lloyd, David | Kenyon, Emma | Donaldson, Sarah | Sehra, Harminder | Almeida-King, Jeff | Loveland, Jane | Trevanion, Stephen | Jones, Matt | Quail, Mike | Willey, Dave | Hunt, Adrienne | Burton, John | Sims, Sarah | McLay, Kirsten | Plumb, Bob | Davis, Joy | Clee, Chris | Oliver, Karen | Clark, Richard | Riddle, Clare | Eliott, David | Threadgold, Glen | Harden, Glenn | Ware, Darren | Mortimer, Beverly | Kerry, Giselle | Heath, Paul | Phillimore, Benjamin | Tracey, Alan | Corby, Nicole | Dunn, Matthew | Johnson, Christopher | Wood, Jonathan | Clark, Susan | Pelan, Sarah | Griffiths, Guy | Smith, Michelle | Glithero, Rebecca | Howden, Philip | Barker, Nicholas | Stevens, Christopher | Harley, Joanna | Holt, Karen | Panagiotidis, Georgios | Lovell, Jamieson | Beasley, Helen | Henderson, Carl | Gordon, Daria | Auger, Katherine | Wright, Deborah | Collins, Joanna | Raisen, Claire | Dyer, Lauren | Leung, Kenric | Robertson, Lauren | Ambridge, Kirsty | Leongamornlert, Daniel | McGuire, Sarah | Gilderthorp, Ruth | Griffiths, Coline | Manthravadi, Deepa | Nichol, Sarah | Barker, Gary | Whitehead, Siobhan | Kay, Michael | Brown, Jacqueline | Murnane, Clare | Gray, Emma | Humphries, Matthew | Sycamore, Neil | Barker, Darren | Saunders, David | Wallis, Justene | Babbage, Anne | Hammond, Sian | Mashreghi-Mohammadi, Maryam | Barr, Lucy | Martin, Sancha | Wray, Paul | Ellington, Andrew | Matthews, Nicholas | Ellwood, Matthew | Woodmansey, Rebecca | Clark, Graham | Cooper, James | Tromans, Anthony | Grafham, Darren | Skuce, Carl | Pandian, Richard | Andrews, Robert | Harrison, Elliot | Kimberley, Andrew | Garnett, Jane | Fosker, Nigel | Hall, Rebekah | Garner, Patrick | Kelly, Daniel | Bird, Christine | Palmer, Sophie | Gehring, Ines | Berger, Andrea | Dooley, Christopher M. | Ersan-Ürün, Zübeyde | Eser, Cigdem | Geiger, Horst | Geisler, Maria | Karotki, Lena | Kirn, Anette | Konantz, Judith | Konantz, Martina | Oberländer, Martina | Rudolph-Geiger, Silke | Teucke, Mathias | Osoegawa, Kazutoyo | Zhu, Baoli | Rapp, Amanda | Widaa, Sara | Langford, Cordelia | Yang, Fengtang | Carter, Nigel P. | Harrow, Jennifer | Ning, Zemin | Herrero, Javier | Searle, Steve M. J. | Enright, Anton | Geisler, Robert | Plasterk, Ronald H. A. | Lee, Charles | Westerfield, Monte | de Jong, Pieter J. | Zon, Leonard I. | Postlethwait, John H. | Nüsslein-Volhard, Christiane | Hubbard, Tim J. P. | Crollius, Hugues Roest | Rogers, Jane | Stemple, Derek L.
Nature  2013;496(7446):498-503.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
doi:10.1038/nature12111
PMCID: PMC3703927  PMID: 23594743
8.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics 
Science (New York, N.Y.)  2013;342(6154):1235587.
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations (“ultrasensitive”) and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, “motif-breakers”). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
doi:10.1126/science.1235587
PMCID: PMC3947637  PMID: 24092746
9.  Ensembl 2014 
Nucleic Acids Research  2013;42(Database issue):D749-D755.
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
doi:10.1093/nar/gkt1196
PMCID: PMC3964975  PMID: 24316576
10.  Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution 
Nature genetics  2013;45(4):415-421e2.
Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms.
doi:10.1038/ng.2568
PMCID: PMC3709584  PMID: 23435085
11.  Stem cells isolated from adipose tissue of obese patients show changes in their transcriptomic profile that indicate loss in stemcellness and increased commitment to an adipocyte-like phenotype 
BMC Genomics  2013;14:625.
Background
The adipose tissue is an endocrine regulator and a risk factor for atherosclerosis and cardiovascular disease when by excessive accumulation induces obesity. Although the adipose tissue is also a reservoir for stem cells (ASC) their function and “stemcellness” has been questioned. Our aim was to investigate the mechanisms by which obesity affects subcutaneous white adipose tissue (WAT) stem cells.
Results
Transcriptomics, in silico analysis, real-time polymerase chain reaction (PCR) and western blots were performed on isolated stem cells from subcutaneous abdominal WAT of morbidly obese patients (ASCmo) and of non-obese individuals (ASCn). ASCmo and ASCn gene expression clustered separately from each other. ASCmo showed downregulation of “stemness” genes and upregulation of adipogenic and inflammatory genes with respect to ASCn. Moreover, the application of bioinformatics and Ingenuity Pathway Analysis (IPA) showed that the transcription factor Smad3 was tentatively affected in obese ASCmo. Validation of this target confirmed a significantly reduced Smad3 nuclear translocation in the isolated ASCmo.
Conclusions
The transcriptomic profile of the stem cells reservoir in obese subcutaneous WAT is highly modified with significant changes in genes regulating stemcellness, lineage commitment and inflammation. In addition to body mass index, cardiovascular risk factor clustering further affect the ASC transcriptomic profile inducing loss of multipotency and, hence, capacity for tissue repair. In summary, the stem cells in the subcutaneous WAT niche of obese patients are already committed to adipocyte differentiation and show an upregulated inflammatory gene expression associated to their loss of stemcellness.
doi:10.1186/1471-2164-14-625
PMCID: PMC3848661  PMID: 24040759
Human adipose-derived stem cells; Subcutaneous adipose tissue; Cardiovascular risk factors; Transcriptome; Inflammatory genes
12.  Ensembl 2013 
Nucleic Acids Research  2012;41(Database issue):D48-D55.
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
doi:10.1093/nar/gks1236
PMCID: PMC3531136  PMID: 23203987
13.  Insights into hominid evolution from the gorilla genome sequence 
Nature  2012;483(7388):169-175.
Summary
Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago (Mya). In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
doi:10.1038/nature10842
PMCID: PMC3303130  PMID: 22398555
14.  Analysis of variation at transcription factor binding sites in Drosophila and humans 
Genome Biology  2012;13(9):R49.
Background
Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines.
Results
We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding.
Conclusions
Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
doi:10.1186/gb-2012-13-9-r49
PMCID: PMC3491393  PMID: 22950968
15.  A high-resolution map of human evolutionary constraint using 29 mammals 
Nature  2011;478(7370):476-482.
Comparison of related genomes has emerged as a powerful lens for genome interpretation. Here, we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and report constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparison with experimental datasets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events, and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements, and ~1,000 primate- and human-accelerated elements. Overlap with disease-associated variants suggests our findings will be relevant for studies of human biology and health.
doi:10.1038/nature10530
PMCID: PMC3207357  PMID: 21993624
16.  Considerations for the inclusion of 2x mammalian genomes in phylogenetic analyses 
Genome Biology  2011;12(2):401.
A response to 2x genomes - depth does matter by MC Milinkovitch, R Helaers, E Depiereux, AC Tzika and T Gabaldón. Genome Biol 2010, 11:R16.
doi:10.1186/gb-2011-12-2-401
PMCID: PMC3188792  PMID: 21320298
17.  Ensembl 2012 
Nucleic Acids Research  2011;40(Database issue):D84-D90.
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
doi:10.1093/nar/gkr991
PMCID: PMC3245178  PMID: 22086963
18.  Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development 
Renfree, Marilyn B | Papenfuss, Anthony T | Deakin, Janine E | Lindsay, James | Heider, Thomas | Belov, Katherine | Rens, Willem | Waters, Paul D | Pharo, Elizabeth A | Shaw, Geoff | Wong, Emily SW | Lefèvre, Christophe M | Nicholas, Kevin R | Kuroki, Yoko | Wakefield, Matthew J | Zenger, Kyall R | Wang, Chenwei | Ferguson-Smith, Malcolm | Nicholas, Frank W | Hickford, Danielle | Yu, Hongshi | Short, Kirsty R | Siddle, Hannah V | Frankenberg, Stephen R | Chew, Keng Yih | Menzies, Brandon R | Stringer, Jessica M | Suzuki, Shunsuke | Hore, Timothy A | Delbridge, Margaret L | Mohammadi, Amir | Schneider, Nanette Y | Hu, Yanqiu | O'Hara, William | Al Nadaf, Shafagh | Wu, Chen | Feng, Zhi-Ping | Cocks, Benjamin G | Wang, Jianghui | Flicek, Paul | Searle, Stephen MJ | Fairley, Susan | Beal, Kathryn | Herrero, Javier | Carone, Dawn M | Suzuki, Yutaka | Sugano, Sumio | Toyoda, Atsushi | Sakaki, Yoshiyuki | Kondo, Shinji | Nishida, Yuichiro | Tatsumoto, Shoji | Mandiou, Ion | Hsu, Arthur | McColl, Kaighin A | Lansdell, Benjamin | Weinstock, George | Kuczek, Elizabeth | McGrath, Annette | Wilson, Peter | Men, Artem | Hazar-Rethinam, Mehlika | Hall, Allison | Davis, John | Wood, David | Williams, Sarah | Sundaravadanam, Yogi | Muzny, Donna M | Jhangiani, Shalini N | Lewis, Lora R | Morgan, Margaret B | Okwuonu, Geoffrey O | Ruiz, San Juana | Santibanez, Jireh | Nazareth, Lynne | Cree, Andrew | Fowler, Gerald | Kovar, Christie L | Dinh, Huyen H | Joshi, Vandita | Jing, Chyn | Lara, Fremiet | Thornton, Rebecca | Chen, Lei | Deng, Jixin | Liu, Yue | Shen, Joshua Y | Song, Xing-Zhi | Edson, Janette | Troon, Carmen | Thomas, Daniel | Stephens, Amber | Yapa, Lankesha | Levchenko, Tanya | Gibbs, Richard A | Cooper, Desmond W | Speed, Terence P | Fujiyama, Asao | M Graves, Jennifer A | O'Neill, Rachel J | Pask, Andrew J | Forrest, Susan M | Worley, Kim C
Genome Biology  2011;12(8):R81.
Background
We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development.
Results
The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements.
Conclusions
Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.
doi:10.1186/gb-2011-12-8-r81
PMCID: PMC3277949  PMID: 21854559
19.  Comparative and demographic analysis of orangutan genomes 
Locke, Devin P. | Hillier, LaDeana W. | Warren, Wesley C. | Worley, Kim C. | Nazareth, Lynne V. | Muzny, Donna M. | Yang, Shiaw-Pyng | Wang, Zhengyuan | Chinwalla, Asif T. | Minx, Pat | Mitreva, Makedonka | Cook, Lisa | Delehaunty, Kim D. | Fronick, Catrina | Schmidt, Heather | Fulton, Lucinda A. | Fulton, Robert S. | Nelson, Joanne O. | Magrini, Vincent | Pohl, Craig | Graves, Tina A. | Markovic, Chris | Cree, Andy | Dinh, Huyen H. | Hume, Jennifer | Kovar, Christie L. | Fowler, Gerald R. | Lunter, Gerton | Meader, Stephen | Heger, Andreas | Ponting, Chris P. | Marques-Bonet, Tomas | Alkan, Can | Chen, Lin | Cheng, Ze | Kidd, Jeffrey M. | Eichler, Evan E. | White, Simon | Searle, Stephen | Vilella, Albert J. | Chen, Yuan | Flicek, Paul | Ma, Jian | Raney, Brian | Suh, Bernard | Burhans, Richard | Herrero, Javier | Haussler, David | Faria, Rui | Fernando, Olga | Darré, Fleur | Farré, Domènec | Gazave, Elodie | Oliva, Meritxell | Navarro, Arcadi | Roberto, Roberta | Capozzi, Oronzo | Archidiacono, Nicoletta | Valle, Giuliano Della | Purgato, Stefania | Rocchi, Mariano | Konkel, Miriam K. | Walker, Jerilyn A. | Ullmer, Brygg | Batzer, Mark A. | Smit, Arian F. A. | Hubley, Robert | Casola, Claudio | Schrider, Daniel R. | Hahn, Matthew W. | Quesada, Victor | Puente, Xose S. | Ordoñez, Gonzalo R. | López-Otín, Carlos | Vinar, Tomas | Brejova, Brona | Ratan, Aakrosh | Harris, Robert S. | Miller, Webb | Kosiol, Carolin | Lawson, Heather A. | Taliwal, Vikas | Martins, André L. | Siepel, Adam | RoyChoudhury, Arindam | Ma, Xin | Degenhardt, Jeremiah | Bustamante, Carlos D. | Gutenkunst, Ryan N. | Mailund, Thomas | Dutheil, Julien Y. | Hobolth, Asger | Schierup, Mikkel H. | Chemnick, Leona | Ryder, Oliver A. | Yoshinaga, Yuko | de Jong, Pieter J. | Weinstock, George M. | Rogers, Jeffrey | Mardis, Elaine R. | Gibbs, Richard A. | Wilson, Richard K.
Nature  2011;469(7331):529-533.
“Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
doi:10.1038/nature09687
PMCID: PMC3060778  PMID: 21270892
20.  Ensembl 2011 
Nucleic Acids Research  2010;39(Database issue):D800-D806.
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
doi:10.1093/nar/gkq1064
PMCID: PMC3013672  PMID: 21045057
21.  Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis 
PLoS Biology  2010;8(9):e1000475.
The combined application of next-generation sequencing platforms has provided an economical approach to unlocking the potential of the turkey genome.
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.
Author Summary
In contrast to the compact sequence of viruses and bacteria, determining the complete genome sequence of complex vertebrate genomes can be a daunting task. With the advent of “next-generation” sequencing platforms, it is now possible to rapidly sequence and assemble a vertebrate genome, especially for species for which genomic resources—genetic maps and markers—are currently available. We used a combination of two next-generation sequencing platforms, Roche 454 and Illumina GAII, and unique assembly tools to sequence the genome of the agriculturally important turkey, Meleagris gallopavo. Our draft assembly comprises approximately 1.1 gigabases of which 917 megabytes are assigned to specific chromosomes. Comparisons of the turkey genome sequence with those of the chicken, Gallus gallus, and the zebra finch, Taeniopygia guttata, provide insights into the evolution of the avian lineage. This genome sequence will facilitate discovery of agriculturally important genetic variants.
doi:10.1371/journal.pbio.1000475
PMCID: PMC2935454  PMID: 20838655
22.  eHive: An Artificial Intelligence workflow system for genomic analysis 
BMC Bioinformatics  2010;11:240.
Background
The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.
Results
We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.
Conclusions
eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
doi:10.1186/1471-2105-11-240
PMCID: PMC2885371  PMID: 20459813
23.  Ensembl’s 10th year 
Nucleic Acids Research  2009;38(Database issue):D557-D562.
Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
doi:10.1093/nar/gkp972
PMCID: PMC2808936  PMID: 19906699
24.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
doi:10.1038/nbt1414
PMCID: PMC2644410  PMID: 18612301

Results 1-25 (31)