Search tips
Search criteria

Results 1-25 (105)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Teaching Bioinformatics at the Secondary School Level 
PLoS Computational Biology  2011;7(10):e1002242.
PMCID: PMC3203059  PMID: 22046116
2.  Frontiers of biomedical text mining: current progress 
Briefings in bioinformatics  2007;8(5):358-375.
It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year.
PMCID: PMC2516302  PMID: 17977867
text mining; natural language processing; information extraction; text summarization; image mining; question answering; literature-based discovery; evaluation; user orientation
3.  Specialty Grand Challenge – Genetic Disorders 
PMCID: PMC3864254  PMID: 24400282
genetic disorders; medical disorders; genome; geneticist; adolescence
4.  Genomics software: The view from 10,000 feet 
Human Genomics  2009;4(1):56-58.
The rate of change in genomics, and 'omics generally, shows no signs of slowing down. Related analysis software is struggling to keep apace. This paper provides a brief review of the field.
PMCID: PMC3500188  PMID: 19951894
genomics; software; systems genetics; 'omics; genome-wide association studies; SNP annotation; networks
5.  Tutorial videos of bioinformatics resources: online distribution trial in Japan named TogoTV 
Briefings in Bioinformatics  2011;13(2):258-268.
In recent years, biological web resources such as databases and tools have become more complex because of the enormous amounts of data generated in the field of life sciences. Traditional methods of distributing tutorials include publishing textbooks and posting web documents, but these static contents cannot adequately describe recent dynamic web services. Due to improvements in computer technology, it is now possible to create dynamic content such as video with minimal effort and low cost on most modern computers. The ease of creating and distributing video tutorials instead of static content improves accessibility for researchers, annotators and curators. This article focuses on online video repositories for educational and tutorial videos provided by resource developers and users. It also describes a project in Japan named TogoTV ( and discusses the production and distribution of high-quality tutorial videos, which would be useful to viewer, with examples. This article intends to stimulate and encourage researchers who develop and use databases and tools to distribute how-to videos as a tool to enhance product usability.
PMCID: PMC3294242  PMID: 21803786
screencast; vodcast; tutorial; YouTube; QuickTime; Flash
6.  Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives 
Briefings in Bioinformatics  2012;14(4):506-519.
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
PMCID: PMC3713712  PMID: 22877769
gene fusion; next generation sequencing; cancer; whole genome sequencing; transcriptome sequencing; computational tools
8.  A comparative analysis of biclustering algorithms for gene expression data 
Briefings in Bioinformatics  2012;14(3):279-292.
The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported. Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.
PMCID: PMC3659300  PMID: 22772837
biclustering; microarray; gene expression; clustering
9.  Mining the pharmacogenomics literature—a survey of the state of the art 
Briefings in Bioinformatics  2012;13(4):460-494.
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
PMCID: PMC3404399  PMID: 22833496
text mining; information extraction; knowledge discovery from texts; text analytics; biomedical natural language processing; pharmacogenomics; pharmacogenetics
10.  Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor 
Briefings in Bioinformatics  2013;15(4):519-533.
The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results, it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip®Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow, we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip®Human Exon 1.0 ST Array.
PMCID: PMC4103539  PMID: 23603090
reproducible research; exon array; differential splicing; ANOSVA; FIRMA; probe-level analysis
12.  Data management strategies for multinational large-scale systems biology projects 
Briefings in Bioinformatics  2012;15(1):65-78.
Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don’t Embrace Open Access. The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects.
PMCID: PMC3896927  PMID: 23047157
data management; data sharing; open access; data citation; systems biology
13.  DNA chip-assisted diagnosis for ocular toxoplasmosis: A comment 
PMCID: PMC3339092  PMID: 22446926
14.  Obituary: Walter Fitch and the orthology paradigm 
Briefings in Bioinformatics  2011;12(5):377-378.
PMCID: PMC3178060  PMID: 21949265
15.  OpenHelix: bioinformatics education outside of a different box 
Briefings in Bioinformatics  2010;11(6):598-609.
The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.
PMCID: PMC2984537  PMID: 20798181
bioinformatics education; training and learning; outreach; genomics; data management; computational biology resources
16.  Fast and efficient searching of biological data resources—using EB-eye 
Briefings in Bioinformatics  2010;11(4):375-384.
The EB-eye is a fast and efficient search engine that provides easy and uniform access to the biological data resources hosted at the EMBL-EBI. Currently, users can access information from more than 62 distinct datasets covering some 400 million entries. The data resources represented in the EB-eye include: nucleotide and protein sequences at both the genomic and proteomic levels, structures ranging from chemicals to macro-molecular complexes, gene-expression experiments, binary level molecular interactions as well as reaction maps and pathway models, functional classifications, biological ontologies, and comprehensive literature libraries covering the biomedical sciences and related intellectual property. The EB-eye can be accessed over the web or programmatically using a SOAP Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines. The EB-eye is a novel alternative to existing biological search and retrieval engines. In this article we describe in detail how to exploit its powerful capabilities.
PMCID: PMC2905521  PMID: 20150321
text search; biological databases; integration; interoperability; web services; Apache Lucene
17.  Saccharomyces genome database: Underlying principles and organisation 
Briefings in bioinformatics  2004;5(1):9-22.
A scientific database can be a powerful tool for biologists in an era where large-scale genomic analysis, combined with smaller-scale scientific results, provides new insights into the roles of genes and their products in the cell. However, the collection and assimilation of data is, in itself, not enough to make a database useful. The data must be incorporated into the database and presented to the user in an intuitive and biologically significant manner. Most importantly, this presentation must be driven by the user’s point of view; that is, from a biological perspective. The success of a scientific database can therefore be measured by the response of its users – statistically, by usage numbers and, in a less quantifiable way, by its relationship with the community it serves and its ability to serve as a model for similar projects. Since its inception ten years ago, the Saccharomyces Genome Database (SGD) has seen a dramatic increase in its usage, has developed and maintained a positive working relationship with the yeast research community, and has served as a template for at least one other database. The success of SGD, as measured by these criteria, is due in large part to philosophies that have guided its mission and organisation since it was established in 1993. This paper aims to detail these philosophies and how they shape the organisation and presentation of the database.
PMCID: PMC3037832  PMID: 15153302
S. cerevisiae; database; genome-wide analysis; bioinformatics; yeast
18.  A survey of available tools and web servers for analysis of protein–protein interactions and interfaces 
Briefings in Bioinformatics  2009;10(3):217-232.
The unanimous agreement that cellular processes are (largely) governed by interactions between proteins has led to enormous community efforts culminating in overwhelming information relating to these proteins; to the regulation of their interactions, to the way in which they interact and to the function which is determined by these interactions. These data have been organized in databases and servers. However, to make these really useful, it is essential not only to be aware of these, but in particular to have a working knowledge of which tools to use for a given problem; what are the tool advantages and drawbacks; and no less important how to combine these for a particular goal since usually it is not one tool, but some combination of tool-modules that is needed. This is the goal of this review.
PMCID: PMC2671387  PMID: 19240123
protein–protein interactions; protein–protein interfaces; binding site prediction; docking; web servers; databases
19.  Where is the weak linkage in the globin chain? 
Hemoglobinopathies are important inherited disorders with high prevalence in many tropical countries. Prediction of protein nanostructure and function is a great challenge in proteomics and structural genomics. Identifying the point vulnerable to mutation is a new trend in research on disorders at the genomic and proteomic level. A bioinformatics analysis was performed to determine the positions that tend to correspond with peptide motifs in the amino acid sequence of alpha and beta globin chains. To identify the weak linkage in alpha globin and beta globin chains, a new bioinformatics tool, GlobPlot, was used. For the alpha globin chain, 22 positions were identified: the disorders were found at positions 3–8, 38–42, 46–51, and 75–79. For the beta globin chain, 46 positions were identified: the disorders were found at positions 61–146. The study showed that weak linkages in alpha globin and beta globin chains can be identified and provide good information for predicting possible new mutations that could lead to new hemoglobinopathies.
PMCID: PMC2426759  PMID: 17722269
globin; hemoglobinopathy; protein structure; weak linkage
20.  Medical informatics and bioinformatics: a bibliometric study 
This paper reports on an analysis of the bioinformatics and medical informatics literature with the objective to identify upcoming trends that are shared among both research fields to derive benefits from potential collaborative initiatives for their future. Our results present the main characteristics of the two fields and show that these domains are still relatively separated.
PMCID: PMC2191144  PMID: 17521073
Computational Biology; classification; statistics & numerical data; trends; Databases, Bibliographic; trends; Internationality; MEDLINE; Medical Informatics; classification; statistics & numerical data; trends; Natural Language Processing; Periodicals as Topic; statistics & numerical data; trends; Vocabulary, Controlled; medicine; informatics; biology; bioinformatics; correspondence analysis; bibliometrics; Principal Component Analysis; PCA; MCA
21.  Draft Genome Sequence of Lactobacillus casei W56 
Journal of Bacteriology  2012;194(23):6638.
We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products.
PMCID: PMC3497524  PMID: 23144392
22.  Draft Genome Sequence of the Flocculating Zymomonas mobilis Strain ZM401 (ATCC 31822) 
Journal of Bacteriology  2012;194(24):7008-7009.
Zymomonas mobilis ZM401 is a flocculating strain which can be self-immobilized within fermentors for a high-cell-density culture to improve ethanol productivity, as well as high-gravity fermentation to increase ethanol titer, due to its improved ethanol tolerance associated with the morphological change. Here, we report its draft genome sequence.
PMCID: PMC3510618  PMID: 23209250
23.  Genome Sequence of the Filamentous Bacterium Fibrisoma limi BUZ 3T 
Journal of Bacteriology  2012;194(16):4445.
Fibrisoma limi strain BUZ 3T, a Gram-negative bacterium, was isolated from coastal mud from the North Sea (Fedderwardersiel, Germany) and characterized using a polyphasic approach in 2011. The genome consists of a chromosome of about 7.5 Mb and three plasmids.
PMCID: PMC3416256  PMID: 22843583
24.  Genome Sequence of Fibrella aestuarina BUZ 2T, a Filamentous Marine Bacterium 
Journal of Bacteriology  2012;194(13):3555.
Fibrella aestuarina BUZ 2T is the type strain of the recently characterized genus Fibrella. Here we report the draft genome sequence of this strain, which consists of a single scaffold representing the chromosome (with 11 gaps) and a 161-kb circular plasmid.
PMCID: PMC3434725  PMID: 22689241
25.  Genomic Comparison between a Virulent Type A1 Strain of Francisella tularensis and Its Attenuated O-Antigen Mutant 
Journal of Bacteriology  2012;194(10):2775-2776.
We report the complete genome sequences of TI0902, a highly virulent type A1 strain, and TIGB03, a related, attenuated chemical mutant strain. Compared to the wild type, the mutant strain had 45 point mutations and a 75.9-kb duplicated region that had not been previously observed in Francisella species.
PMCID: PMC3347185  PMID: 22535949

Results 1-25 (105)