Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees 
Briefings in Bioinformatics  2011;12(5):423-435.
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of ‘Gold standard’ phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.
PMCID: PMC3178055  PMID: 21737420
conceptual comparison; phylogenomic databases; quality assessment; reference gene trees
2.  Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes) 
Briefings in Bioinformatics  2011;12(5):474-484.
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
PMCID: PMC3178057  PMID: 21712341
Chondrichthyes; trained gene prediction; next generation sequencing; genome assembly; orthology
3.  The what, where, how and why of gene ontology—a primer for bioinformaticians 
Briefings in Bioinformatics  2011;12(6):723-735.
With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications.
PMCID: PMC3220872  PMID: 21330331
gene ontology; gene annotation; semantic similarity; gene function; function prediction
4.  Base-calling for next-generation sequencing platforms 
Briefings in Bioinformatics  2011;12(5):489-497.
Next-generation sequencing platforms are dramatically reducing the cost of DNA sequencing. With these technologies, bases are inferred from light intensity signals, a process commonly referred to as base-calling. Thus, understanding and improving the quality of sequence data generated using these approaches are of high interest. Recently, a number of papers have characterized the biases associated with base-calling and proposed methodological improvements. In this review, we summarize recent development of base-calling approaches for the Illumina and Roche 454 sequencing platforms.
PMCID: PMC3178052  PMID: 21245079
Base-calling; next generation sequencing; deep sequencing; illumina/solexa; roche/454; bustard

Results 1-4 (4)