PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-10 (10)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima 
Chipman, Ariel D. | Ferrier, David E. K. | Brena, Carlo | Qu, Jiaxin | Hughes, Daniel S. T. | Schröder, Reinhard | Torres-Oliva, Montserrat | Znassi, Nadia | Jiang, Huaiyang | Almeida, Francisca C. | Alonso, Claudio R. | Apostolou, Zivkos | Aqrawi, Peshtewani | Arthur, Wallace | Barna, Jennifer C. J. | Blankenburg, Kerstin P. | Brites, Daniela | Capella-Gutiérrez, Salvador | Coyle, Marcus | Dearden, Peter K. | Du Pasquier, Louis | Duncan, Elizabeth J. | Ebert, Dieter | Eibner, Cornelius | Erikson, Galina | Evans, Peter D. | Extavour, Cassandra G. | Francisco, Liezl | Gabaldón, Toni | Gillis, William J. | Goodwin-Horn, Elizabeth A. | Green, Jack E. | Griffiths-Jones, Sam | Grimmelikhuijzen, Cornelis J. P. | Gubbala, Sai | Guigó, Roderic | Han, Yi | Hauser, Frank | Havlak, Paul | Hayden, Luke | Helbing, Sophie | Holder, Michael | Hui, Jerome H. L. | Hunn, Julia P. | Hunnekuhl, Vera S. | Jackson, LaRonda | Javaid, Mehwish | Jhangiani, Shalini N. | Jiggins, Francis M. | Jones, Tamsin E. | Kaiser, Tobias S. | Kalra, Divya | Kenny, Nathan J. | Korchina, Viktoriya | Kovar, Christie L. | Kraus, F. Bernhard | Lapraz, François | Lee, Sandra L. | Lv, Jie | Mandapat, Christigale | Manning, Gerard | Mariotti, Marco | Mata, Robert | Mathew, Tittu | Neumann, Tobias | Newsham, Irene | Ngo, Dinh N. | Ninova, Maria | Okwuonu, Geoffrey | Ongeri, Fiona | Palmer, William J. | Patil, Shobha | Patraquim, Pedro | Pham, Christopher | Pu, Ling-Ling | Putman, Nicholas H. | Rabouille, Catherine | Ramos, Olivia Mendivil | Rhodes, Adelaide C. | Robertson, Helen E. | Robertson, Hugh M. | Ronshaugen, Matthew | Rozas, Julio | Saada, Nehad | Sánchez-Gracia, Alejandro | Scherer, Steven E. | Schurko, Andrew M. | Siggens, Kenneth W. | Simmons, DeNard | Stief, Anna | Stolle, Eckart | Telford, Maximilian J. | Tessmar-Raible, Kristin | Thornton, Rebecca | van der Zee, Maurijn | von Haeseler, Arndt | Williams, James M. | Willis, Judith H. | Wu, Yuanqing | Zou, Xiaoyan | Lawson, Daniel | Muzny, Donna M. | Worley, Kim C. | Gibbs, Richard A. | Akam, Michael | Richards, Stephen
PLoS Biology  2014;12(11):e1002005.
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.
Author Summary
Arthropods are the most abundant animals on earth. Among them, insects clearly dominate on land, whereas crustaceans hold the title for the most diverse invertebrates in the oceans. Much is known about the biology of these groups, not least because of genomic studies of the fruit fly Drosophila, the water flea Daphnia, and other species used in research. Here we report the first genome sequence from a species belonging to a lineage that has previously received very little attention—the myriapods. Myriapods were among the first arthropods to invade the land over 400 million years ago, and survive today as the herbivorous millipedes and venomous centipedes, one of which—Strigamia maritima—we have sequenced here. We find that the genome of this centipede retains more characteristics of the presumed arthropod ancestor than other sequenced insect genomes. The genome provides access to many aspects of myriapod biology that have not been studied before, suggesting, for example, that they have diversified receptors for smell that are quite different from those used by insects. In addition, it shows specific consequences of the largely subterranean life of this particular species, which seems to have lost the genes for all known light-sensing molecules, even though it still avoids light.
doi:10.1371/journal.pbio.1002005
PMCID: PMC4244043  PMID: 25423365
2.  Evolutionary profiling reveals the heterogeneous origins of classes of human disease genes: implications for modeling disease genetics in animals 
BMC Evolutionary Biology  2014;14(1):212.
Background
The recent expansion of whole-genome sequence data available from diverse animal lineages provides an opportunity to investigate the evolutionary origins of specific classes of human disease genes. Previous studies have observed that human disease genes are of particularly ancient origin. While this suggests that many animal species have the potential to serve as feasible models for research on genes responsible for human disease, it is unclear whether this pattern has meaningful implications and whether it prevails for every class of human disease.
Results
We used a comparative genomics approach encompassing a broad phylogenetic range of animals with sequenced genomes to determine the evolutionary patterns exhibited by human genes associated with different classes of disease. Our results support previous claims that most human disease genes are of ancient origin but, more importantly, we also demonstrate that several specific disease classes have a significantly large proportion of genes that emerged relatively recently within the metazoans and/or vertebrates. An independent assessment of the synonymous to non-synonymous substitution rates of human disease genes found in mammals reveals that disease classes that arose more recently also display unexpected rates of purifying selection between their mammalian and human counterparts.
Conclusions
Our results reveal the heterogeneity underlying the evolutionary origins of (and selective pressures on) different classes of human disease genes. For example, some disease gene classes appear to be of uncommonly recent (i.e., vertebrate-specific) origin and, as a whole, have been evolving at a faster rate within mammals than the majority of disease classes having more ancient origins. The novel patterns that we have identified may provide new insight into cases where studies using traditional animal models were unable to produce results that translated to humans. Conversely, we note that the larger set of disease classes do have ancient origins, suggesting that many non-traditional animal models have the potential to be useful for studying many human disease genes. Taken together, these findings emphasize why model organism selection should be done on a disease-by-disease basis, with evolutionary profiles in mind.
Electronic supplementary material
The online version of this article (doi:10.1186/s12862-014-0212-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s12862-014-0212-1
PMCID: PMC4219131  PMID: 25281000
Model organism selection; Human disease genes; Evolutionary genetics; Comparative genomics
3.  Insights into bilaterian evolution from three spiralian genomes 
Nature  2012;493(7433):526-531.
Current genomic perspectives on animal diversity neglect two prominent phyla, the molluscs and annelids, that together account for nearly one-third of known marine species and are important both ecologically and as experimental systems in classical embryology1–3. Here we describe the draft genomes of the owl limpet (Lottia gigantea), a marine polychaete (Capitella teleta) and a freshwater leech (Helobdella robusta), and compare them with other animal genomes to investigate the origin and diversification of bilaterians from a genomic perspective. We find that the genome organization, gene structure and functional content of these species are more similar to those of some invertebrate deuterostome genomes (for example, amphioxus and sea urchin) than those of other protostomes that have been sequenced to date (flies, nematodes and flatworms). The conservation of these genomic features enables us to expand the inventory of genes present in the last common bilaterian ancestor, establish the tripartite diversification of bilaterians using multiple genomic characteristics and identify ancient conserved long- and short-range genetic linkages across metazoans. Superimposed on this broadly conserved pan-bilaterian background we find examples of lineage-specific genome evolution, including varying rates of rearrangement, intron gain and loss, expansions and contractions of gene families, and the evolution of clade-specific genes that produce the unique content of each genome.
doi:10.1038/nature11696
PMCID: PMC4085046  PMID: 23254933
4.  The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution 
Science (New York, N.Y.)  2013;342(6164):1242592.
An understanding of ctenophore biology is critical for reconstructing events that occurred early in animal evolution. Towards this goal, we have sequenced, assembled, and annotated the genome of the ctenophore Mnemiopsis leidyi. Our phylogenomic analyses of both amino acid positions and gene content suggests that ctenophores rather than sponges are the sister lineage to all other animals. Mnemiopsis lacks many of the genes found in bilaterian mesodermal cell types, suggesting that these cell types evolved independently. The set of neural genes in Mnemiopsis is similar to that of sponges, indicating that sponges may have lost a nervous system. These results present a new view of early animal evolution that accounts for major losses and/or gains of sophisticated cell types, including nerve and muscle cells.
doi:10.1126/science.1242592
PMCID: PMC3920664  PMID: 24337300
5.  Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication 
GigaScience  2014;3:9.
Background
Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of “living fossils.” As arthropods, they belong to the Ecdysozoa, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses.
Results
Here we apply a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab, Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers grouped into 1,876 distinct genetic intervals and 5,775 candidate conserved protein coding genes.
Conclusions
Comparison with other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications 300 million years ago, followed by extensive chromosome fusion. These results provide a counter-example to the often noted correlation between whole genome duplication and evolutionary radiations. The new, low-cost genetic mapping method for obtaining a chromosome-scale view of non-model organism genomes that we demonstrate here does not require laboratory culture, and is potentially applicable to a broad range of other species.
doi:10.1186/2047-217X-3-9
PMCID: PMC4066314  PMID: 24987520
Genotyping-by-sequencing (GBS); Genetic linkage mapping; Genome evolution; Limulus polyphemus
6.  Whole Genome Sequencing of Mutation Accumulation Lines Reveals a Low Mutation Rate in the Social Amoeba Dictyostelium discoideum 
PLoS ONE  2012;7(10):e46759.
Spontaneous mutations play a central role in evolution. Despite their importance, mutation rates are some of the most elusive parameters to measure in evolutionary biology. The combination of mutation accumulation (MA) experiments and whole-genome sequencing now makes it possible to estimate mutation rates by directly observing new mutations at the molecular level across the whole genome. We performed an MA experiment with the social amoeba Dictyostelium discoideum and sequenced the genomes of three randomly chosen lines using high-throughput sequencing to estimate the spontaneous mutation rate in this model organism. The mitochondrial mutation rate of 6.76×10−9, with a Poisson confidence interval of 4.1×10−9 − 9.5×10−9, per nucleotide per generation is slightly lower than estimates for other taxa. The mutation rate estimate for the nuclear DNA of 2.9×10−11, with a Poisson confidence interval ranging from 7.4×10−13 to 1.6×10−10, is the lowest reported for any eukaryote. These results are consistent with low microsatellite mutation rates previously observed in D. discoideum and low levels of genetic variation observed in wild D. discoideum populations. In addition, D. discoideum has been shown to be quite resistant to DNA damage, which suggests an efficient DNA-repair mechanism that could be an adaptation to life in soil and frequent exposure to intracellular and extracellular mutagenic compounds. The social aspect of the life cycle of D. discoideum and a large portion of the genome under relaxed selection during vegetative growth could also select for a low mutation rate. This hypothesis is supported by a significantly lower mutation rate per cell division in multicellular eukaryotes compared with unicellular eukaryotes.
doi:10.1371/journal.pone.0046759
PMCID: PMC3466296  PMID: 23056439
7.  Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes 
BMC Bioinformatics  2011;12(Suppl 9):S11.
Background
Many metazoan genomes conserve chromosome-scale gene linkage relationships (“macro-synteny”) from the common ancestor of multicellular animal life [1-4], but the biological explanation for this conservation is still unknown. Double cut and join (DCJ) is a simple, well-studied model of neutral genome evolution amenable to both simulation and mathematical analysis [5], but as we show here, it is not sufficent to explain long-term macro-synteny conservation.
Results
We examine a family of simple (one-parameter) extensions of DCJ to identify models and choices of parameters consistent with the levels of macro- and micro-synteny conservation observed among animal genomes. Our software implements a flexible strategy for incorporating genomic context into the DCJ model to incorporate various types of genomic context (“DCJ-[C]”), and is available as open source software from http://github.com/putnamlab/dcj-c.
Conclusions
A simple model of genome evolution, in which DCJ moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macro-synteny conservation and for correlated conservation among multiple pairs of species. Simulations under this model indicate that a constraint on approximately 7% of metazoan genes is sufficient to constrain genome rearrangement to an average rate of 25 inversions and 1.7 translocations per million years.
doi:10.1186/1471-2105-12-S9-S11
PMCID: PMC3283319  PMID: 22151646
8.  Bos taurus genome assembly 
BMC Genomics  2009;10:180.
Background
We present here the assembly of the bovine genome. The assembly method combines the BAC plus WGS local assembly used for the rat and sea urchin with the whole genome shotgun (WGS) only assembly used for many other animal genomes including the rhesus macaque.
Results
The assembly process consisted of multiple phases: First, BACs were assembled with BAC generated sequence, then subsequently in combination with the individual overlapping WGS reads. Different assembly parameters were tested to separately optimize the performance for each BAC assembly of the BAC and WGS reads. In parallel, a second assembly was produced using only the WGS sequences and a global whole genome assembly method. The two assemblies were combined to create a more complete genome representation that retained the high quality BAC-based local assembly information, but with gaps between BACs filled in with the WGS-only assembly. Finally, the entire assembly was placed on chromosomes using the available map information.
Over 90% of the assembly is now placed on chromosomes. The estimated genome size is 2.87 Gb which represents a high degree of completeness, with 95% of the available EST sequences found in assembled contigs. The quality of the assembly was evaluated by comparison to 73 finished BACs, where the draft assembly covers between 92.5 and 100% (average 98.5%) of the finished BACs. The assembly contigs and scaffolds align linearly to the finished BACs, suggesting that misassemblies are rare. Genotyping and genetic mapping of 17,482 SNPs revealed that more than 99.2% were correctly positioned within the Btau_4.0 assembly, confirming the accuracy of the assembly.
Conclusion
The biological analysis of this bovine genome assembly is being published, and the sequence data is available to support future bovine research.
doi:10.1186/1471-2164-10-180
PMCID: PMC2686734  PMID: 19393050
9.  The DNA sequence of the human X chromosome 
Ross, Mark T. | Grafham, Darren V. | Coffey, Alison J. | Scherer, Steven | McLay, Kirsten | Muzny, Donna | Platzer, Matthias | Howell, Gareth R. | Burrows, Christine | Bird, Christine P. | Frankish, Adam | Lovell, Frances L. | Howe, Kevin L. | Ashurst, Jennifer L. | Fulton, Robert S. | Sudbrak, Ralf | Wen, Gaiping | Jones, Matthew C. | Hurles, Matthew E. | Andrews, T. Daniel | Scott, Carol E. | Searle, Stephen | Ramser, Juliane | Whittaker, Adam | Deadman, Rebecca | Carter, Nigel P. | Hunt, Sarah E. | Chen, Rui | Cree, Andrew | Gunaratne, Preethi | Havlak, Paul | Hodgson, Anne | Metzker, Michael L. | Richards, Stephen | Scott, Graham | Steffen, David | Sodergren, Erica | Wheeler, David A. | Worley, Kim C. | Ainscough, Rachael | Ambrose, Kerrie D. | Ansari-Lari, M. Ali | Aradhya, Swaroop | Ashwell, Robert I. S. | Babbage, Anne K. | Bagguley, Claire L. | Ballabio, Andrea | Banerjee, Ruby | Barker, Gary E. | Barlow, Karen F. | Barrett, Ian P. | Bates, Karen N. | Beare, David M. | Beasley, Helen | Beasley, Oliver | Beck, Alfred | Bethel, Graeme | Blechschmidt, Karin | Brady, Nicola | Bray-Allen, Sarah | Bridgeman, Anne M. | Brown, Andrew J. | Brown, Mary J. | Bonnin, David | Bruford, Elspeth A. | Buhay, Christian | Burch, Paula | Burford, Deborah | Burgess, Joanne | Burrill, Wayne | Burton, John | Bye, Jackie M. | Carder, Carol | Carrel, Laura | Chako, Joseph | Chapman, Joanne C. | Chavez, Dean | Chen, Ellson | Chen, Guan | Chen, Yuan | Chen, Zhijian | Chinault, Craig | Ciccodicola, Alfredo | Clark, Sue Y. | Clarke, Graham | Clee, Chris M. | Clegg, Sheila | Clerc-Blankenburg, Kerstin | Clifford, Karen | Cobley, Vicky | Cole, Charlotte G. | Conquer, Jen S. | Corby, Nicole | Connor, Richard E. | David, Robert | Davies, Joy | Davis, Clay | Davis, John | Delgado, Oliver | DeShazo, Denise | Dhami, Pawandeep | Ding, Yan | Dinh, Huyen | Dodsworth, Steve | Draper, Heather | Dugan-Rocha, Shannon | Dunham, Andrew | Dunn, Matthew | Durbin, K. James | Dutta, Ireena | Eades, Tamsin | Ellwood, Matthew | Emery-Cohen, Alexandra | Errington, Helen | Evans, Kathryn L. | Faulkner, Louisa | Francis, Fiona | Frankland, John | Fraser, Audrey E. | Galgoczy, Petra | Gilbert, James | Gill, Rachel | Glöckner, Gernot | Gregory, Simon G. | Gribble, Susan | Griffiths, Coline | Grocock, Russell | Gu, Yanghong | Gwilliam, Rhian | Hamilton, Cerissa | Hart, Elizabeth A. | Hawes, Alicia | Heath, Paul D. | Heitmann, Katja | Hennig, Steffen | Hernandez, Judith | Hinzmann, Bernd | Ho, Sarah | Hoffs, Michael | Howden, Phillip J. | Huckle, Elizabeth J. | Hume, Jennifer | Hunt, Paul J. | Hunt, Adrienne R. | Isherwood, Judith | Jacob, Leni | Johnson, David | Jones, Sally | de Jong, Pieter J. | Joseph, Shirin S. | Keenan, Stephen | Kelly, Susan | Kershaw, Joanne K. | Khan, Ziad | Kioschis, Petra | Klages, Sven | Knights, Andrew J. | Kosiura, Anna | Kovar-Smith, Christie | Laird, Gavin K. | Langford, Cordelia | Lawlor, Stephanie | Leversha, Margaret | Lewis, Lora | Liu, Wen | Lloyd, Christine | Lloyd, David M. | Loulseged, Hermela | Loveland, Jane E. | Lovell, Jamieson D. | Lozado, Ryan | Lu, Jing | Lyne, Rachael | Ma, Jie | Maheshwari, Manjula | Matthews, Lucy H. | McDowall, Jennifer | McLaren, Stuart | McMurray, Amanda | Meidl, Patrick | Meitinger, Thomas | Milne, Sarah | Miner, George | Mistry, Shailesh L. | Morgan, Margaret | Morris, Sidney | Müller, Ines | Mullikin, James C. | Nguyen, Ngoc | Nordsiek, Gabriele | Nyakatura, Gerald | O’Dell, Christopher N. | Okwuonu, Geoffery | Palmer, Sophie | Pandian, Richard | Parker, David | Parrish, Julia | Pasternak, Shiran | Patel, Dina | Pearce, Alex V. | Pearson, Danita M. | Pelan, Sarah E. | Perez, Lesette | Porter, Keith M. | Ramsey, Yvonne | Reichwald, Kathrin | Rhodes, Susan | Ridler, Kerry A. | Schlessinger, David | Schueler, Mary G. | Sehra, Harminder K. | Shaw-Smith, Charles | Shen, Hua | Sheridan, Elizabeth M. | Shownkeen, Ratna | Skuce, Carl D. | Smith, Michelle L. | Sotheran, Elizabeth C. | Steingruber, Helen E. | Steward, Charles A. | Storey, Roy | Swann, R. Mark | Swarbreck, David | Tabor, Paul E. | Taudien, Stefan | Taylor, Tineace | Teague, Brian | Thomas, Karen | Thorpe, Andrea | Timms, Kirsten | Tracey, Alan | Trevanion, Steve | Tromans, Anthony C. | d’Urso, Michele | Verduzco, Daniel | Villasana, Donna | Waldron, Lenee | Wall, Melanie | Wang, Qiaoyan | Warren, James | Warry, Georgina L. | Wei, Xuehong | West, Anthony | Whitehead, Siobhan L. | Whiteley, Mathew N. | Wilkinson, Jane E. | Willey, David L. | Williams, Gabrielle | Williams, Leanne | Williamson, Angela | Williamson, Helen | Wilming, Laurens | Woodmansey, Rebecca L. | Wray, Paul W. | Yen, Jennifer | Zhang, Jingkun | Zhou, Jianling | Zoghbi, Huda | Zorilla, Sara | Buck, David | Reinhardt, Richard | Poustka, Annemarie | Rosenthal, André | Lehrach, Hans | Meindl, Alfons | Minx, Patrick J. | Hillier, LaDeana W. | Willard, Huntington F. | Wilson, Richard K. | Waterston, Robert H. | Rice, Catherine M. | Vaudin, Mark | Coulson, Alan | Nelson, David L. | Weinstock, George | Sulston, John E. | Durbin, Richard | Hubbard, Tim | Gibbs, Richard A. | Beck, Stephan | Rogers, Jane | Bentley, David R.
Nature  2005;434(7031):325-337.
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
doi:10.1038/nature03440
PMCID: PMC2665286  PMID: 15772651
10.  Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps 
PLoS ONE  2008;3(3):e1836.
The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.
doi:10.1371/journal.pone.0001836
PMCID: PMC2266800  PMID: 18350171

Results 1-10 (10)