PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-8 (8)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Analysis of the African coelacanth genome sheds light on tetrapod evolution 
Amemiya, Chris T. | Alföldi, Jessica | Lee, Alison P. | Fan, Shaohua | Philippe, Hervé | MacCallum, Iain | Braasch, Ingo | Manousaki, Tereza | Schneider, Igor | Rohner, Nicolas | Organ, Chris | Chalopin, Domitille | Smith, Jeramiah J. | Robinson, Mark | Dorrington, Rosemary A. | Gerdol, Marco | Aken, Bronwen | Biscotti, Maria Assunta | Barucca, Marco | Baurain, Denis | Berlin, Aaron M. | Blatch, Gregory L. | Buonocore, Francesco | Burmester, Thorsten | Campbell, Michael S. | Canapa, Adriana | Cannon, John P. | Christoffels, Alan | De Moro, Gianluca | Edkins, Adrienne L. | Fan, Lin | Fausto, Anna Maria | Feiner, Nathalie | Forconi, Mariko | Gamieldien, Junaid | Gnerre, Sante | Gnirke, Andreas | Goldstone, Jared V. | Haerty, Wilfried | Hahn, Mark E. | Hesse, Uljana | Hoffmann, Steve | Johnson, Jeremy | Karchner, Sibel I. | Kuraku, Shigehiro | Lara, Marcia | Levin, Joshua Z. | Litman, Gary W. | Mauceli, Evan | Miyake, Tsutomu | Mueller, M. Gail | Nelson, David R. | Nitsche, Anne | Olmo, Ettore | Ota, Tatsuya | Pallavicini, Alberto | Panji, Sumir | Picone, Barbara | Ponting, Chris P. | Prohaska, Sonja J. | Przybylski, Dariusz | Saha, Nil Ratan | Ravi, Vydianathan | Ribeiro, Filipe J. | Sauka-Spengler, Tatjana | Scapigliati, Giuseppe | Searle, Stephen M. J. | Sharpe, Ted | Simakov, Oleg | Stadler, Peter F. | Stegeman, John J. | Sumiyama, Kenta | Tabbaa, Diana | Tafer, Hakim | Turner-Maier, Jason | van Heusden, Peter | White, Simon | Williams, Louise | Yandell, Mark | Brinkmann, Henner | Volff, Jean-Nicolas | Tabin, Clifford J. | Shubin, Neil | Schartl, Manfred | Jaffe, David | Postlethwait, John H. | Venkatesh, Byrappa | Di Palma, Federica | Lander, Eric S. | Meyer, Axel | Lindblad-Toh, Kerstin
Nature  2013;496(7445):311-316.
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
doi:10.1038/nature12027
PMCID: PMC3633110  PMID: 23598338
2.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species 
Bradnam, Keith R | Fass, Joseph N | Alexandrov, Anton | Baranay, Paul | Bechner, Michael | Birol, Inanç | Boisvert, Sébastien | Chapman, Jarrod A | Chapuis, Guillaume | Chikhi, Rayan | Chitsaz, Hamidreza | Chou, Wen-Chi | Corbeil, Jacques | Del Fabbro, Cristian | Docking, T Roderick | Durbin, Richard | Earl, Dent | Emrich, Scott | Fedotov, Pavel | Fonseca, Nuno A | Ganapathy, Ganeshkumar | Gibbs, Richard A | Gnerre, Sante | Godzaridis, Élénie | Goldstein, Steve | Haimel, Matthias | Hall, Giles | Haussler, David | Hiatt, Joseph B | Ho, Isaac Y | Howard, Jason | Hunt, Martin | Jackman, Shaun D | Jaffe, David B | Jarvis, Erich D | Jiang, Huaiyang | Kazakov, Sergey | Kersey, Paul J | Kitzman, Jacob O | Knight, James R | Koren, Sergey | Lam, Tak-Wah | Lavenier, Dominique | Laviolette, François | Li, Yingrui | Li, Zhenyu | Liu, Binghang | Liu, Yue | Luo, Ruibang | MacCallum, Iain | MacManes, Matthew D | Maillet, Nicolas | Melnikov, Sergey | Naquin, Delphine | Ning, Zemin | Otto, Thomas D | Paten, Benedict | Paulo, Octávio S | Phillippy, Adam M | Pina-Martins, Francisco | Place, Michael | Przybylski, Dariusz | Qin, Xiang | Qu, Carson | Ribeiro, Filipe J | Richards, Stephen | Rokhsar, Daniel S | Ruby, J Graham | Scalabrin, Simone | Schatz, Michael C | Schwartz, David C | Sergushichev, Alexey | Sharpe, Ted | Shaw, Timothy I | Shendure, Jay | Shi, Yujian | Simpson, Jared T | Song, Henry | Tsarev, Fedor | Vezzi, Francesco | Vicedomini, Riccardo | Vieira, Bruno M | Wang, Jun | Worley, Kim C | Yin, Shuangye | Yiu, Siu-Ming | Yuan, Jianying | Zhang, Guojie | Zhang, Hao | Zhou, Shiguo | Korf, Ian F
GigaScience  2013;2:10.
Background
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.
Results
In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.
Conclusions
Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
doi:10.1186/2047-217X-2-10
PMCID: PMC3844414  PMID: 23870653
Genome assembly; N50; Scaffolds; Assessment; Heterozygosity; COMPASS
3.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads 
Genome Biology  2009;10(10):R103.
Allpaths2, a method for accurately assembling small genomes with high continuity using short paired reads.
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).
doi:10.1186/gb-2009-10-10-r103
PMCID: PMC2784318  PMID: 19796385
4.  Powerful fusion: PSI-BLAST and consensus sequences 
Bioinformatics (Oxford, England)  2008;24(18):1987-1993.
Motivation
A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences.
Results
We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a widely popular and effective method could be used to identify significantly more relevant similarities among protein sequences.
Availability
http://www.rostlab.org/services/consensus/
Contact:
dsp23@columbia.edu
doi:10.1093/bioinformatics/btn384
PMCID: PMC2577777  PMID: 18678588
5.  Powerful fusion: PSI-BLAST and consensus sequences 
Bioinformatics  2008;24(18):1987-1993.
Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences.
Results: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences.
Availability: http://www.rostlab.org/services/consensus/
Contact: dariusz@mit.edu
doi:10.1093/bioinformatics/btn384
PMCID: PMC2577777  PMID: 18678588
6.  Consensus sequences improve PSI-BLAST through mimicking profile–profile alignments 
Nucleic Acids Research  2007;35(7):2238-2246.
Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify sequence relatedness through profile–profile comparisons are much slower and more complex than sequence–sequence and sequence–profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here, we propose a novel approach for consensus-sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification of distant structural relations between proteins and their corresponding alignments. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve; no parameter used by PSI-BLAST was altered and no single line of code changed. Furthermore, the consensus sequence add-on required relatively little additional CPU time. We discuss how advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. We have also made the method available through the Internet (http://www.rostlab.org/services/consensus/).
doi:10.1093/nar/gkm107
PMCID: PMC1874647  PMID: 17369271
7.  Predicting transmembrane beta-barrels in proteomes 
Nucleic Acids Research  2004;32(8):2566-2577.
Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a novel profile-based hidden Markov model for the prediction and discrimination of TMBs. The method carefully attempts to avoid over-fitting the sparse experimental data. While our model training and scoring procedures were very similar to a recently published work, the architecture and structure-based labelling were significantly different. In particular, we introduced a new definition of beta- hairpin motifs, explicit state modelling of transmembrane strands, and a log-odds whole-protein discrimination score. The resulting method reached an overall four-state (up-, down-strand, periplasmic-, outer-loop) accuracy as high as 86%. Furthermore, accurately discriminated TMB from non-TMB proteins (45% coverage at 100% accuracy). This high precision enabled the application to 72 entirely sequenced Gram-negative bacteria. We found over 164 previously uncharacterized TMB proteins at high confidence. Database searches did not implicate any of these proteins with membranes. We challenge that the vast majority of our 164 predictions will eventually be verified experimentally. All proteome predictions and the PROFtmb prediction method are available at http://www.rostlab.org/services/PROFtmb/.
doi:10.1093/nar/gkh580
PMCID: PMC419468  PMID: 15141026
8.  EVA: evaluation of protein structure prediction servers 
Nucleic Acids Research  2003;31(13):3311-3315.
EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods.
PMCID: PMC169025  PMID: 12824315

Results 1-8 (8)