1.  Analysis of the African coelacanth genome sheds light on tetrapod evolution 
Amemiya, Chris T. | Alföldi, Jessica | Lee, Alison P. | Fan, Shaohua | Philippe, Hervé | MacCallum, Iain | Braasch, Ingo | Manousaki, Tereza | Schneider, Igor | Rohner, Nicolas | Organ, Chris | Chalopin, Domitille | Smith, Jeramiah J. | Robinson, Mark | Dorrington, Rosemary A. | Gerdol, Marco | Aken, Bronwen | Biscotti, Maria Assunta | Barucca, Marco | Baurain, Denis | Berlin, Aaron M. | Blatch, Gregory L. | Buonocore, Francesco | Burmester, Thorsten | Campbell, Michael S. | Canapa, Adriana | Cannon, John P. | Christoffels, Alan | De Moro, Gianluca | Edkins, Adrienne L. | Fan, Lin | Fausto, Anna Maria | Feiner, Nathalie | Forconi, Mariko | Gamieldien, Junaid | Gnerre, Sante | Gnirke, Andreas | Goldstone, Jared V. | Haerty, Wilfried | Hahn, Mark E. | Hesse, Uljana | Hoffmann, Steve | Johnson, Jeremy | Karchner, Sibel I. | Kuraku, Shigehiro | Lara, Marcia | Levin, Joshua Z. | Litman, Gary W. | Mauceli, Evan | Miyake, Tsutomu | Mueller, M. Gail | Nelson, David R. | Nitsche, Anne | Olmo, Ettore | Ota, Tatsuya | Pallavicini, Alberto | Panji, Sumir | Picone, Barbara | Ponting, Chris P. | Prohaska, Sonja J. | Przybylski, Dariusz | Saha, Nil Ratan | Ravi, Vydianathan | Ribeiro, Filipe J. | Sauka-Spengler, Tatjana | Scapigliati, Giuseppe | Searle, Stephen M. J. | Sharpe, Ted | Simakov, Oleg | Stadler, Peter F. | Stegeman, John J. | Sumiyama, Kenta | Tabbaa, Diana | Tafer, Hakim | Turner-Maier, Jason | van Heusden, Peter | White, Simon | Williams, Louise | Yandell, Mark | Brinkmann, Henner | Volff, Jean-Nicolas | Tabin, Clifford J. | Shubin, Neil | Schartl, Manfred | Jaffe, David | Postlethwait, John H. | Venkatesh, Byrappa | Di Palma, Federica | Lander, Eric S. | Meyer, Axel | Lindblad-Toh, Kerstin
Nature  2013;496(7445):311-316.
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
PMCID: PMC3633110  PMID: 23598338
2.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species 
Bradnam, Keith R | Fass, Joseph N | Alexandrov, Anton | Baranay, Paul | Bechner, Michael | Birol, Inanç | Boisvert, Sébastien | Chapman, Jarrod A | Chapuis, Guillaume | Chikhi, Rayan | Chitsaz, Hamidreza | Chou, Wen-Chi | Corbeil, Jacques | Del Fabbro, Cristian | Docking, T Roderick | Durbin, Richard | Earl, Dent | Emrich, Scott | Fedotov, Pavel | Fonseca, Nuno A | Ganapathy, Ganeshkumar | Gibbs, Richard A | Gnerre, Sante | Godzaridis, Élénie | Goldstein, Steve | Haimel, Matthias | Hall, Giles | Haussler, David | Hiatt, Joseph B | Ho, Isaac Y | Howard, Jason | Hunt, Martin | Jackman, Shaun D | Jaffe, David B | Jarvis, Erich D | Jiang, Huaiyang | Kazakov, Sergey | Kersey, Paul J | Kitzman, Jacob O | Knight, James R | Koren, Sergey | Lam, Tak-Wah | Lavenier, Dominique | Laviolette, François | Li, Yingrui | Li, Zhenyu | Liu, Binghang | Liu, Yue | Luo, Ruibang | MacCallum, Iain | MacManes, Matthew D | Maillet, Nicolas | Melnikov, Sergey | Naquin, Delphine | Ning, Zemin | Otto, Thomas D | Paten, Benedict | Paulo, Octávio S | Phillippy, Adam M | Pina-Martins, Francisco | Place, Michael | Przybylski, Dariusz | Qin, Xiang | Qu, Carson | Ribeiro, Filipe J | Richards, Stephen | Rokhsar, Daniel S | Ruby, J Graham | Scalabrin, Simone | Schatz, Michael C | Schwartz, David C | Sergushichev, Alexey | Sharpe, Ted | Shaw, Timothy I | Shendure, Jay | Shi, Yujian | Simpson, Jared T | Song, Henry | Tsarev, Fedor | Vezzi, Francesco | Vicedomini, Riccardo | Vieira, Bruno M | Wang, Jun | Worley, Kim C | Yin, Shuangye | Yiu, Siu-Ming | Yuan, Jianying | Zhang, Guojie | Zhang, Hao | Zhou, Shiguo | Korf, Ian F
GigaScience  2013;2:10.
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.
In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.
Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
PMCID: PMC3844414  PMID: 23870653
Genome assembly; N50; Scaffolds; Assessment; Heterozygosity; COMPASS
3.  A direct characterization of human mutation based on microsatellites 
Nature genetics  2012;44(10):1161-1165.
Mutations are the raw material of evolution, but have been difficult to study directly. We report the largest study of new mutations to date: 2,058 germline changes discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. The paternal-to-maternal mutation rate ratio is 3.3, and the rate in fathers doubles from age 20 to 58 whereas there is no association with age in mothers. Longer microsatellite alleles are more mutagenic and tend to decrease in length, whereas the opposite is seen for shorter alleles. We use these empirical observations to build a model that we apply to individuals for whom we have both genome sequence and microsatellite data, allowing us to estimate key parameters of evolution without calibration to the fossil record. We infer that the sequence mutation rate is 1.4–2.3×10−8 per base pair per generation (90% credible interval), and that human-chimpanzee speciation occurred 3.7–6.6 million years ago.
PMCID: PMC3459271  PMID: 22922873
4.  De novo assembly of highly diverse viral populations 
BMC Genomics  2012;13:475.
Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage.
We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: viral-genomics-analysis-software.
We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.
PMCID: PMC3469330  PMID: 22974120
5.  A high-resolution map of human evolutionary constraint using 29 mammals 
Nature  2011;478(7370):476-482.
Comparison of related genomes has emerged as a powerful lens for genome interpretation. Here, we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and report constrained elements covering ~4.2% of the genome. We use evolutionary signatures and comparison with experimental datasets to suggest candidate functions for ~60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events, and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements, and ~1,000 primate- and human-accelerated elements. Overlap with disease-associated variants suggests our findings will be relevant for studies of human biology and health.
PMCID: PMC3207357  PMID: 21993624
7.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection 
PLoS Pathogens  2012;8(3):e1002529.
Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.
Author Summary
The ability of HIV-1 and other highly variable pathogens to rapidly mutate to escape vaccine-induced immune responses represents a major hurdle to the development of effective vaccines to these highly persistent pathogens. Application of next-generation or deep sequencing technologies to the study of host pathogens could significantly improve our understanding of the mechanisms by which these pathogens subvert host immunity, and aid in the development of novel vaccines and therapeutics. Here, we developed a 454 deep sequencing approach to enable the sensitive detection of low-frequency viral variants across the entire HIV-1 genome. When applied to the acute phase of HIV-1 infection we observed that the majority of early, low frequency mutations represented viral adaptations to host cellular immune responses, evidence of strong host immunity developing during the early decline of peak viral load. Rapid viral escape from the most dominant immune responses however correlated with loss of this initial viral control, suggestive of the importance of mounting immune responses against more conserved regions of the virus. These data provide a greater understanding of the early evolutionary events subverting the ability of host immune responses to control early HIV-1 replication, yielding important insight into the design of more effective vaccine strategies.
PMCID: PMC3297584  PMID: 22412369
9.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads 
Genome Biology  2009;10(10):R103.
Allpaths2, a method for accurately assembling small genomes with high continuity using short paired reads.
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).
PMCID: PMC2784318  PMID: 19796385
10.  Assisted assembly: how to improve a de novo genome assembly by using related species 
Genome Biology  2009;10(8):R88.
A method is described for improving low sequence coverage genome assemblies
We describe a new assembly algorithm, where a genome assembly with low sequence coverage, either throughout the genome or locally, due to cloning bias, is considerably improved through an assisting process via a related genome. We show that the information provided by aligning the whole-genome shotgun reads of the target against a reference genome can be used to substantially improve the quality of the resulting assembly.
PMCID: PMC2745769  PMID: 19712469
11.  Closing gaps in the human genome using sequencing by synthesis 
Genome Biology  2009;10(6):R60.
A novel method for closing non-structural gaps in the human genome assembly using 454 sequencing is presented here.
The most recent release of the finished human genome contains 260 euchromatic gaps (excluding chromosome Y). Recent work has helped explain a large number of these unresolved regions as 'structural' in nature. Another class of gaps is likely to be refractory to clone-based approaches, and cannot be approached in ways previously described. We present an approach for closing these gaps using 454 sequencing. As a proof of principle, we closed all three remaining non-structural gaps in chromosome 15.
PMCID: PMC2718494  PMID: 19490611
12.  DNA sequence of human chromosome 17 and analysis of rearrangement in the human lineage 
Zody, Michael C. | Garber, Manuel | Adams, David J. | Sharpe, Ted | Harrow, Jennifer | Lupski, James R. | Nicholson, Christine | Searle, Steven M. | Wilming, Laurens | Young, Sarah K. | Abouelleil, Amr | Allen, Nicole R. | Bi, Weimin | Bloom, Toby | Borowsky, Mark L. | Bugalter, Boris E. | Butler, Jonathan | Chang, Jean L. | Chen, Chao-Kung | Cook, April | Corum, Benjamin | Cuomo, Christina A. | de Jong, Pieter J. | DeCaprio, David | Dewar, Ken | FitzGerald, Michael | Gilbert, James | Gibson, Richard | Gnerre, Sante | Goldstein, Steven | Grafham, Darren V. | Grocock, Russell | Hafez, Nabil | Hagopian, Daniel S. | Hart, Elizabeth | Norman, Catherine Hosage | Humphray, Sean | Jaffe, David B. | Jones, Matt | Kamal, Michael | Khodiyar, Varsha K. | LaButti, Kurt | Laird, Gavin | Lehoczky, Jessica | Liu, Xiaohong | Lokyitsang, Tashi | Loveland, Jane | Lui, Annie | Macdonald, Pendexter | Major, John E. | Matthews, Lucy | Mauceli, Evan | McCarroll, Steven A. | Mihalev, Atanas H. | Mudge, Jonathan | Nguyen, Cindy | Nicol, Robert | O'Leary, Sinéad B. | Osoegawa, Kazutoyo | Schwartz, David C. | Shaw-Smith, Charles | Stankiewicz, Pawel | Steward, Charles | Swarbreck, David | Venkataraman, Vijay | Whittaker, Charles A. | Yang, Xiaoping | Zimmer, Andrew R. | Bradley, Allan | Hubbard, Tim | Birren, Bruce W. | Rogers, Jane | Lander, Eric S. | Nusbaum, Chad
Nature  2006;440(7087):1045-1049.
Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome1, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome2,3. It is also enriched in segmental duplications, ranking third in density among the autosomes4. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution5,6, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.
PMCID: PMC2610434  PMID: 16625196
13.  Analysis of Chimpanzee History Based on Genome Sequence Alignments 
PLoS Genetics  2008;4(4):e1000057.
Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and a macaque. Analysis provides a more precise understanding of demographic history than was previously available. We show that bonobos and common chimpanzees were separated ∼1,290,000 years ago, western and other common chimpanzees ∼510,000 years ago, and eastern and central chimpanzees at least 50,000 years ago. We infer that the central chimpanzee population size increased by at least a factor of 4 since its separation from western chimpanzees, while the western chimpanzee effective population size decreased. Surprisingly, in about one percent of the genome, the genetic relationships between humans, chimpanzees, and bonobos appear to be different from the species relationships. We used PCR-based resequencing to confirm 11 regions where chimpanzees and bonobos are not most closely related. Study of such loci should provide information about the period of time 5–7 million years ago when the ancestors of humans separated from those of the chimpanzees.
Author Summary
Studies of population history traditionally examine a small number of genetic regions in many individuals; however, with genome sequencing technologies it is possible to assemble data sets with thousands more aligned sequences albeit in fewer individuals. To explore whether such data can provide useful insights about population history, we assembled large-scale data sets consisting of overlaps of random genome sequencing reads from chimpanzees and bonobos. Analysis of these data finds that bonobos and chimpanzees split from each other about 1.29 million years ago, western and central chimpanzees about 0.51 million years ago, and eastern and central chimpanzees at least 50,000 years ago. We find that the chimpanzee population has fluctuated significantly in size over the past half million years, with the central chimpanzee population size expanding dramatically, and the western chimpanzee population size contracting. Surprisingly, we also find that there are widespread regions of the genome where chimpanzees and bonobos are less closely related to each other than any of them are to humans. In these regions, chimpanzees and bonobos share a common genetic ancestor dating back to speciation from humans, providing a new source of information about that evolutionary event.
PMCID: PMC2278377  PMID: 18421364

