Search tips
Search criteria

Results 1-25 (1156463)

Clipboard (0)

Related Articles

1.  Combining Quantitative Genetic Footprinting and Trait Enrichment Analysis to Identify Fitness Determinants of a Bacterial Pathogen 
PLoS Genetics  2013;9(8):e1003716.
Strains of Extraintestinal Pathogenic Escherichia coli (ExPEC) exhibit an array of virulence strategies and are a major cause of urinary tract infections, sepsis and meningitis. Efforts to understand ExPEC pathogenesis are challenged by the high degree of genetic and phenotypic variation that exists among isolates. Determining which virulence traits are widespread and which are strain-specific will greatly benefit the design of more effective therapies. Towards this goal, we utilized a quantitative genetic footprinting technique known as transposon insertion sequencing (Tn-seq) in conjunction with comparative pathogenomics to functionally dissect the genetic repertoire of a reference ExPEC isolate. Using Tn-seq and high-throughput zebrafish infection models, we tracked changes in the abundance of ExPEC variants within saturated transposon mutant libraries following selection within distinct host niches. Nine hundred and seventy bacterial genes (18% of the genome) were found to promote pathogen fitness in either a niche-dependent or independent manner. To identify genes with the highest therapeutic and diagnostic potential, a novel Trait Enrichment Analysis (TEA) algorithm was developed to ascertain the phylogenetic distribution of candidate genes. TEA revealed that a significant portion of the 970 genes identified by Tn-seq have homologues more often contained within the genomes of ExPEC and other known pathogens, which, as suggested by the first axiom of molecular Koch's postulates, is considered to be a key feature of true virulence determinants. Three of these Tn-seq-derived pathogen-associated genes—a transcriptional repressor, a putative metalloendopeptidase toxin and a hypothetical DNA binding protein—were deleted and shown to independently affect ExPEC fitness in zebrafish and mouse models of infection. Together, the approaches and observations reported herein provide a resource for future pathogenomics-based research and highlight the diversity of factors required by a single ExPEC isolate to survive within varying host environments.
Author Summary
Antibiotic resistance is an increasingly serious problem, especially among pathogenic strains of Escherichia coli that cause urinary tract infections, sepsis and meningitis. It is important to obtain a more comprehensive genome-wide understanding of bacterial virulence because it has the potential to uncover novel and alternative therapeutic targets. Therefore, we probed the genome of a pathogenic E. coli isolate using transposon mutagenesis, deep sequencing and comparative pathogenomics in an effort to define its virulence gene repertoire. Using this multilayered approach in combination with high-throughput zebrafish infection models, we identified hundreds of genes that affect pathogen fitness during localized and/or blood-borne infections. We also developed a bioinformatics-based method to systematically sift through our datasets for genes that are broadly conserved among an assortment of pathogenic species. Follow-up analysis of several pathogen-associated candidate genes using zebrafish and mouse infection models highlighted the capacity of our approach to identify novel fitness determinants. The results from this study are available via an interactive online data viewer ( so that investigators can more effectively search and utilize these findings.
PMCID: PMC3749937  PMID: 23990803
2.  CoryneBase: Corynebacterium Genomic Resources and Analysis Tools at Your Fingertips 
PLoS ONE  2014;9(1):e86318.
Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at
PMCID: PMC3895029  PMID: 24466021
3.  VibrioBase: A Model for Next-Generation Genome and Annotation Database Development 
The Scientific World Journal  2014;2014:569324.
To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.
PMCID: PMC4138799  PMID: 25243218
4.  Leptospiral Pathogenomics 
Pathogens  2014;3(2):280-308.
Leptospirosis, caused by pathogenic spirochetes belonging to the genus Leptospira, is a zoonosis with important impacts on human and animal health worldwide. Research on the mechanisms of Leptospira pathogenesis has been hindered due to slow growth of infectious strains, poor transformability, and a paucity of genetic tools. As a result of second generation sequencing technologies, there has been an acceleration of leptospiral genome sequencing efforts in the past decade, which has enabled a concomitant increase in functional genomics analyses of Leptospira pathogenesis. A pathogenomics approach, by coupling of pan-genomic analysis of multiple isolates with sequencing of experimentally attenuated highly pathogenic Leptospira, has resulted in the functional inference of virulence factors. The global Leptospira Genome Project supported by the U.S. National Institute of Allergy and Infectious Diseases to which key scientific contributions have been made from the international leptospirosis research community has provided a new roadmap for comprehensive studies of Leptospira and leptospirosis well into the future. This review describes functional genomics approaches to apply the data generated by the Leptospira Genome Project towards deepening our knowledge of virulence factors of Leptospira using the emerging discipline of pathogenomics.
PMCID: PMC4243447  PMID: 25437801
Leptospira; pathogenomics; virulence; genomics; evolution; taxonomy; molecular epidemiology; systems biology
5.  HelicoBase: a Helicobacter genomic resource and analysis platform 
BMC Genomics  2014;15(1):600.
Helicobacter is a genus of Gram-negative bacteria, possessing a characteristic helical shape that has been associated with a wide spectrum of human diseases. Although much research has been done on Helicobacter and many genomes have been sequenced, currently there is no specialized Helicobacter genomic resource and analysis platform to facilitate analysis of these genomes. With the increasing number of Helicobacter genomes being sequenced, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of diseases caused by Helicobacter pathogens.
To facilitate the ongoing research on Helicobacter, a specialized central repository and analysis platform for the Helicobacter research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data, particularly comparative analysis. Here we present HelicoBase, a user-friendly Helicobacter resource platform with diverse functionality for the analysis of Helicobacter genomic data for the Helicobacter research communities. HelicoBase hosts a total of 13 species and 166 genome sequences of Helicobacter spp. Genome annotations such as gene/protein sequences, protein function and sub-cellular localisation are also included. Our web implementation supports diverse query types and seamless searching of annotations using an AJAX-based real-time searching system. JBrowse is also incorporated to allow rapid and seamless browsing of Helicobacter genomes and annotations. Advanced bioinformatics analysis tools consisting of standard BLAST for similarity search, VFDB BLAST for sequence similarity search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis are also included to facilitate the analysis of Helicobacter genomic data.
HelicoBase offers access to a range of genomic resources as well as tools for the analysis of Helicobacter genome data. HelicoBase can be accessed at
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-600) contains supplementary material, which is available to authorized users.
PMCID: PMC4108788  PMID: 25030426
HelicoBase; Helicobacter; Genomic resources; Pairwise genome comparison tool; Pathogenomics profiling tool; Comparative analysis
6.  VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics 
Nucleic Acids Research  2007;36(Database issue):D539-D542.
Virulence factor database (VFDB) was set up in 2004 dedicated for providing current knowledge of virulence factors (VFs) from various medical significant bacterial pathogens to facilitate pathogenomic research. Nowadays, complete genome sequences of almost all the major pathogenic microbes have been determined, which makes comparative genomics a powerful approach for uncovering novel virulence determinants and hidden aspects of pathogenesis. VFDB was therefore upgraded to present the enormous diversity of bacterial genomes in terms of virulence genes and their organization. The VFDB 2008 release includes the following new features; (i) detailed tabular comparison of virulence composition of a given genome with other genomes of the same genus, (ii) multiple alignments and statistical analysis of homologous VFs and (iii) graphical comparison of genomic organizations of virulence genes. Comparative analysis of the numerous VFs will improve our understanding of the nature and evolution of virulence, as well as the development of new therapeutic and preventive strategies. VFDB 2008 release offers more user-friendly tools for comparative pathogenomics and it is publicly accessible at
PMCID: PMC2238871  PMID: 17984080
7.  Pathogenomic Inference of Virulence-Associated Genes in Leptospira interrogans 
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.
Author Summary
Leptospirosis is one of the most common diseases transmitted by animals worldwide. It is important because it causes an often lethal febrile illnesses in tropical and subtropical areas associated with poor sanitation and agriculture. Leptospirosis may be epidemic, associated with natural disasters and flooding, or endemic in tropical regions. It is unknown how Leptospira cause disease and why different strains cause different severity of illness. In this study we attenuated (weakened) a highly virulent strain of L. interrogans by culturing it in vitro over several months. Comparison of the whole genome sequence before and after the attenuation process revealed a small set of genes that were mutated, and therefore associated with virulence. We discovered a putative soluble adenylate cyclase with host cell cAMP elevating activity, with implications for immune evasion and a new gene family that is upregulated in vivo during acute hamster infection. Interestingly, both Bartonella bacilliformis and Bartonella australis also have this unique gene family we describe in pathogenic Leptospira. This information aids in our understanding of Leptospira evolution and pathogenesis.
PMCID: PMC3789758  PMID: 24098822
8.  ArrayPipe: a flexible processing pipeline for microarray data 
Nucleic Acids Research  2004;32(Web Server issue):W457-W459.
A number of microarray analysis software packages exist already; however, none combines the user-friendly features of a web-based interface with potential ability to analyse multiple arrays at once using flexible analysis steps. The ArrayPipe web server (freely available at allows the automated application of complex analyses to microarray data which can range from single slides to large data sets including replicates and dye-swaps. It handles output from most commonly used quantification software packages for dual-labelled arrays. Application features range from quality assessment of slides through various data visualizations to multi-step analyses including normalization, detection of differentially expressed genes, andcomparison and highlighting of gene lists. A highly customizable action set-up facilitates unrestricted arrangement of functions, which can be stored as action profiles. A unique combination of web-based and command-line functionality enables comfortable configuration of processes that can be repeatedly applied to large data sets in high throughput. The output consists of reports formatted as standard web pages and tab-delimited lists of calculated values that can be inserted into other analysis programs. Additional features, such as web-based spreadsheet functionality, auto-parallelization and password protection make this a powerful tool in microarray research for individuals and large groups alike.
PMCID: PMC441584  PMID: 15215429
9.  FusoBase: an online Fusobacterium comparative genomic analysis platform 
Fusobacterium are anaerobic gram-negative bacteria that have been associated with a wide spectrum of human infections and diseases. As the biology of Fusobacterium is still not well understood, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections and diseases. To facilitate the ongoing genomic research on Fusobacterium, a specialized database with easy-to-use analysis tools is necessary. Here we present FusoBase, an online database providing access to genome-wide annotated sequences of Fusobacterium strains as well as bioinformatics tools, to support the expanding scientific community. Using our custom-developed Pairwise Genome Comparison tool, we demonstrate how differences between two user-defined genomes and how insertion of putative prophages can be identified. In addition, Pathogenomics Profiling Tool is capable of clustering predicted genes across Fusobacterium strains and visualizing the results in the form of a heat map with dendrogram.
Database URL:
PMCID: PMC4141642  PMID: 25149689
10.  Enhancing the Role of Veterinary Vaccines Reducing Zoonotic Diseases of Humans: Linking Systems Biology with Vaccine Development 
Vaccine  2011;29(41):7197-7206.
The aim of research on infectious diseases is their prevention, and brucellosis and salmonellosis as such are classic examples of worldwide zoonoses for application of a systems biology approach for enhanced rational vaccine development. When used optimally, vaccines prevent disease manifestations, reduce transmission of disease, decrease the need for pharmaceutical intervention, and improve the health and welfare of animals, as well as indirectly protecting against zoonotic diseases of people. Advances in the last decade or so using comprehensive systems biology approaches linking genomics, proteomics, bioinformatics, and biotechnology with immunology, pathogenesis and vaccine formulation and delivery are expected to enable enhanced approaches to vaccine development. The goal of this paper is to evaluate the role of computational systems biology analysis of host:pathogen interactions (the interactome) as a tool for enhanced rational design of vaccines. Systems biology is bringing a new, more robust approach to veterinary vaccine design based upon a deeper understanding of the host-pathogen interactions and its impact on the host's molecular network of the immune system. A computational systems biology method was utilized to create interactome models of the host responses to Brucella melitensis (BMEL), Mycobacterium avium paratuberculosis (MAP), Salmonella enterica Typhimurium (STM), and a Salmonella mutant (isogenic ΔsipA, sopABDE2) and linked to the basis for rational development of vaccines for brucellosis and salmonellosis as reviewed by Adams and Ficht (Adams et al. 2009; Ficht et al. 2009). A bovine ligated ileal loop biological model was established to capture the host gene expression response at multiple time points post infection. New methods based on Dynamic Bayesian Network (DBN) machine learning were employed to conduct a comparative pathogenicity analysis of 219 signaling and metabolic pathways and 1620 Gene Ontology (GO) categories that defined the host's biosignatures to each infectious condition. Through this DBN computational approach, the method identified significantly perturbed pathways and GO category groups of genes that define the pathogenicity signatures of the infectious agent. Our preliminary results provide deeper understanding of the overall complexity of host innate immune response as well as the identification of host gene perturbations that defines a unique host temporal biosignature response to each pathogen. The application of advanced computational methods for developing interactome models based on DBNs has proven to be instrumental in elucidating novel host responses and improved functional biological insight into the host defensive mechanisms. Evaluating the unique differences in pathway and GO perturbations across pathogen conditions allowed the identification of plausible host-pathogen interaction mechanisms. Accordingly, a systems biology approach to study molecular pathway gene expression profiles of host cellular responses to microbial pathogens holds great promise as a methodology to identify, model and predict the overall dynamics of the host-pathogen interactome. Thus, we propose that such an approach has immediate application to the rational design of brucellosis and salmonellosis vaccines.
PMCID: PMC3170448  PMID: 21651944
11.  Promoting synergistic research and education in genomics and bioinformatics 
BMC Genomics  2008;9(Suppl 1):I1.
Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology.
High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine ( and its official journals, the International Journal of Functional Informatics and Personalized Medicine ( and the International Journal of Computational Biology and Drug Design ( in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and scientific achievements by bridging these two very important disciplines into an interactive and attractive forum. Keeping this objective in mind, Biocomp 2007 aims to promote interdisciplinary and multidisciplinary education and research. 25 high quality peer-reviewed papers were selected from 400+ submissions for this supplementary issue of BMC Genomics. Those papers contributed to a wide-range of important research fields including gene expression data analysis and applications, high-throughput genome mapping, sequence analysis, gene regulation, protein structure prediction, disease prediction by machine learning techniques, systems biology, database and biological software development. We always encourage participants submitting proposals for genomics sessions, special interest research sessions, workshops and tutorials to Professor Hamid R. Arabnia ( in order to ensure that Biocomp continuously plays the leadership role in promoting inter/multidisciplinary research and education in the fields. Biocomp received top conference ranking with a high score of 0.95/1.00. Biocomp is academically co-sponsored by the International Society of Intelligent Biological Medicine and the Research Laboratories and Centers of Harvard University – Massachusetts Institute of Technology, Indiana University - Purdue University, Georgia Tech – Emory University, UIUC, UCLA, Columbia University, University of Texas at Austin and University of Iowa etc. Biocomp - Worldcomp brings leading scientists together across the nation and all over the world and aims to promote synergistic components such as keynote lectures, special interest sessions, workshops and tutorials in response to the advances of cutting-edge research.
PMCID: PMC3226105  PMID: 18366597
12.  Absolute quantification of microbial proteomes at different states by directed mass spectrometry 
The developed, directed mass spectrometry workflow allows to generate consistent and system-wide quantitative maps of microbial proteomes in a single analysis. Application to the human pathogen L. interrogans revealed mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense, and new insights about the regulation of absolute protein abundances within operons.
The developed, directed proteomic approach allowed consistent detection and absolute quantification of 1680 proteins of the human pathogen L. interrogans in a single LC–MS/MS experiment.The comparison of 25 extensive, consistent and quantitative proteome maps revealed new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans, and about the regulation of protein abundances within operons.The generated time-resolved data sets are compatible with pattern analysis algorithms developed for transcriptomics, including hierarchical clustering and functional enrichment analysis of the detected profile clusters.This is the first study that describes the absolute quantitative behavior of any proteome over multiple states and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
Over the last decade, mass spectrometry (MS)-based proteomics has evolved as the method of choice for system-wide proteome studies and now allows for the characterization of several thousands of proteins in a single sample. Despite these great advances, redundant monitoring of protein levels over large sample numbers in a high-throughput manner remains a challenging task. New directed MS strategies have shown to overcome some of the current limitations, thereby enabling the acquisition of consistent and system-wide data sets of proteomes with low-to-moderate complexity at high throughput.
In this study, we applied this integrated, two-stage MS strategy to investigate global proteome changes in the human pathogen L. interrogans. In the initial discovery phase, 1680 proteins (out of around 3600 gene products) could be identified (Schmidt et al, 2008) and, by focusing precious MS-sequencing time on the most dominant, specific peptides per protein, all proteins could be accurately and consistently monitored over 25 different samples within a few days of instrument time in the following scoring phase (Figure 1). Additionally, the co-analysis of heavy reference peptides enabled us to obtain absolute protein concentration estimates for all identified proteins in each perturbation (Malmström et al, 2009). The detected proteins did not show any biases against functional groups or protein classes, including membrane proteins, and span an abundance range of more than three orders of magnitude, a range that is expected to cover most of the L. interrogans proteome (Malmström et al, 2009).
To elucidate mechanistic proteome changes over time involved in pathogenic progression and antibiotic defense of L. interrogans, we generated time-resolved proteome maps of cells perturbed with serum and three different antibiotics at sublethal concentrations that are currently used to treat Leptospirosis. This yielded an information-rich proteomic data set that describes, for the first time, the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date. Using this unique property of the data set, we could quantify protein components of entire pathways across several time points and subject the data sets to cluster analysis, a tool that was previously limited to the transcript level due to incomplete sampling on protein level (Figure 4). Based on these analyses, we could demonstrate that Leptospira cells adjust the cellular abundance of a certain subset of proteins and pathways as a general response to stress while other parts of the proteome respond highly specific. The cells furthermore react to individual treatments by ‘fine tuning' the abundance of certain proteins and pathways in order to cope with the specific cause of stress. Intriguingly, the most specific and significant expression changes were observed for proteins involved in motility, tissue penetration and virulence after serum treatment where we tried to simulate the host environment. While many of the detected protein changes demonstrate good agreement with available transcriptomics data, most proteins showed a poor correlation. This includes potential virulence factors, like Loa22 or OmpL1, with confirmed expression in vivo that were significantly up-regulated on the protein level, but not on the mRNA level, strengthening the importance of proteomic studies. The high resolution and coverage of the proteome data set enabled us to further investigate protein abundance changes of co-regulated genes within operons. This suggests that although most proteins within an operon respond to regulation synchronously, bacterial cells seem to have subtle means to adjust the levels of individual proteins or protein groups outside of the general trend, a phenomena that was recently also observed on the transcript level of other bacteria (Güell et al, 2009).
The method can be implemented with standard high-resolution mass spectrometers and software tools that are readily available in the majority of proteomics laboratories. It is scalable to any proteome of low-to-medium complexity and can be extended to post-translational modifications or peptide-labeling strategies for quantification. We therefore expect the approach outlined here to become a cornerstone for microbial systems biology.
Over the past decade, liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has evolved into the main proteome discovery technology. Up to several thousand proteins can now be reliably identified from a sample and the relative abundance of the identified proteins can be determined across samples. However, the remeasurement of substantially similar proteomes, for example those generated by perturbation experiments in systems biology, at high reproducibility and throughput remains challenging. Here, we apply a directed MS strategy to detect and quantify sets of pre-determined peptides in tryptic digests of cells of the human pathogen Leptospira interrogans at 25 different states. We show that in a single LC–MS/MS experiment around 5000 peptides, covering 1680 L. interrogans proteins, can be consistently detected and their absolute expression levels estimated, revealing new insights about the proteome changes involved in pathogenic progression and antibiotic defense of L. interrogans. This is the first study that describes the absolute quantitative behavior of any proteome over multiple states, and represents the most comprehensive proteome abundance pattern comparison for any organism to date.
PMCID: PMC3159967  PMID: 21772258
absolute quantification; directed mass spectrometry; Leptospira interrogans; microbiology; proteomics
13.  OrtholugeDB: a bacterial and archaeal orthology resource for improved comparative genomic analysis 
Nucleic Acids Research  2012;41(Database issue):D366-D376.
Prediction of orthologs (homologous genes that diverged because of speciation) is an integral component of many comparative genomics methods. Although orthologs are more likely to have similar function versus paralogs (genes that diverged because of duplication), recent studies have shown that their degree of functional conservation is variable. Also, there are inherent problems with several large-scale ortholog prediction approaches. To address these issues, we previously developed Ortholuge, which uses phylogenetic distance ratios to provide more precise ortholog assessments for a set of predicted orthologs. However, the original version of Ortholuge required manual intervention and was not easily accessible; therefore, we now report the development of OrtholugeDB, available online at OrtholugeDB provides ortholog predictions for completely sequenced bacterial and archaeal genomes from NCBI based on reciprocal best Basic Local Alignment Search Tool hits, supplemented with further evaluation by the more precise Ortholuge method. The OrtholugeDB web interface facilitates user-friendly and flexible ortholog analysis, from single genes to genomes, plus flexible data download options. We compare Ortholuge with similar methods, showing how it may more consistently identify orthologs with conserved features across a wide range of taxonomic distances. OrtholugeDB facilitates rapid, and more accurate, bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions.
PMCID: PMC3531125  PMID: 23203876
14.  ProbeLynx: a tool for updating the association of microarray probes to genes 
Nucleic Acids Research  2004;32(Web Server issue):W471-W474.
As genome sequence data and gene prediction improve, probes developed for a given microarray experiment should be continuously re-evaluated for their specificity for given genes. ProbeLynx( is a new web service which uses current genomic sequence information to re-examine microarray probe specificity and provide annotation updates relevant to determining which gene(s) and transcript(s) are associated with a given probe. Probe sequences (either oligonucleotide- or cDNA-based) are uploaded in FASTA format and the results returned as a tab-delimited flat file for insertion into a spreadsheet application or database management system for further analysis. ProbeLynx has been initially developed to focus on arrays derived from human, mouse, chicken and bovine genomes, but may be expanded to handle other genomic datasets. ProbeLynx offers microarray users the important ability to continuously assess the potential of a probe to cross-hybridize to paralogous genes and the suitability of a given probe to investigate a transcript of interest. By also including the latest gene function annotation information in the output, ProbeLynx provides the critical first step in updating microarray data annotation.
PMCID: PMC441590  PMID: 15215432
15.  Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data 
BMC Genomics  2013;14:514.
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).
To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.
Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from
PMCID: PMC3750322  PMID: 23895370
Gene expression; Contrast data; Gene set; Gene set enrichment; Omics; Microarray; Next-generation sequencing; Reproducible research system; Knowledge acquisition
16.  Life Science Research and Drug Discovery at the Turn of the 21st Century: The Experience of SwissBioGrid 
It is often said that the life sciences are transforming into an information science. As laboratory experiments are starting to yield ever increasing amounts of data and the capacity to deal with those data is catching up, an increasing share of scientific activity is seen to be taking place outside the laboratories, sifting through the data and modelling “in-silico” the processes observed “in-vitro.” The transformation of the life sciences and similar developments in other disciplines have inspired a variety of initiatives around the world to create technical infrastructure to support the new scientific practices that are emerging. The e-Science programme in the United Kingdom and the NSF Office for Cyberinfrastructure are examples of these. In Switzerland there have been no such national initiatives. Yet, this has not prevented scientists from exploring the development of similar types of computing infrastructures. In 2004, a group of researchers in Switzerland established a project, SwissBioGrid, to explore whether Grid computing technologies could be successfully deployed within the life sciences. This paper presents their experiences as a case study of how the life sciences are currently operating as an information science and presents the lessons learned about how existing institutional and technical arrangements facilitate or impede this operation.
SwissBioGrid was established to provide computational support to two pilot projects: one for proteomics data analysis, and the other for high-throughput molecular docking (“virtual screening”) to find new drugs for neglected diseases (specifically, for dengue fever). The proteomics project was an example of a large-scale data management problem, applying many different analysis algorithms to Terabyte-sized datasets from mass spectrometry, involving comparisons with many different reference databases; the virtual screening project was more a purely computational problem, modelling the interactions of millions of small molecules with a limited number of dengue virus protein targets. Both present interesting lessons about how scientific practices are changing when they tackle the problems of large-scale data analysis and data management by means of creating a novel technical infrastructure.
In the experience of SwissBioGrid, data intensive discovery has a lot to gain from close collaboration with industry and harnessing distributed computing power. Yet the diversity in life science research implies only a limited role for generic infrastructure; and the transience of support means that researchers need to integrate their efforts with others if they want to sustain the benefits of their success, which are otherwise lost.
PMCID: PMC2850249  PMID: 19521952
17.  Joint estimation of DNA copy number from multiple platforms 
Bioinformatics  2009;26(2):153-160.
Motivation: DNA copy number variants (CNVs) are gains and losses of segments of chromosomes, and comprise an important class of genetic variation. Recently, various microarray hybridization-based techniques have been developed for high-throughput measurement of DNA copy number. In many studies, multiple technical platforms or different versions of the same platform were used to interrogate the same samples; and it became necessary to pool information across these multiple sources to derive a consensus molecular profile for each sample. An integrated analysis is expected to maximize resolution and accuracy, yet currently there is no well-formulated statistical method to address the between-platform differences in probe coverage, assay methods, sensitivity and analytical complexity.
Results: The conventional approach is to apply one of the CNV detection (‘segmentation’) algorithms to search for DNA segments of altered signal intensity. The results from multiple platforms are combined after segmentation. Here we propose a new method, Multi-Platform Circular Binary Segmentation (MPCBS), which pools statistical evidence across platforms during segmentation, and does not require pre-standardization of different data sources. It involves a weighted sum of t-statistics, which arises naturally from the generalized log-likelihood ratio of a multi-platform model. We show by comparing the integrated analysis of Affymetrix and Illumina SNP array data with Agilent and fosmid clone end-sequencing results on eight HapMap samples that MPCBS achieves improved spatial resolution, detection power and provides a natural consensus across platforms. We also apply the new method to analyze multi-platform data for tumor samples.
Availability: The R package for MPCBS is registered on R-Forge ( under project name MPCBS.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2852203  PMID: 19933593
18.  BμG@Sbase—a microbial gene expression and comparative genomic database 
Nucleic Acids Research  2011;40(Database issue):D605-D609.
The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase ( is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.
PMCID: PMC3245117  PMID: 21948792
19.  Laboratory markers in ulcerative colitis: Current insights and future advances 
Ulcerative colitis (UC) and Crohn’s disease (CD) are the major forms of inflammatory bowel diseases (IBD) in man. Despite some common features, these forms can be distinguished by different genetic predisposition, risk factors and clinical, endoscopic and histological characteristics. The aetiology of both CD and UC remains unknown, but several evidences suggest that CD and perhaps UC are due to an excessive immune response directed against normal constituents of the intestinal bacterial flora. Tests sometimes invasive are routine for the diagnosis and care of patients with IBD. Diagnosis of UC is based on clinical symptoms combined with radiological and endoscopic investigations. The employment of non-invasive biomarkers is needed. These biomarkers have the potential to avoid invasive diagnostic tests that may result in discomfort and potential complications. The ability to determine the type, severity, prognosis and response to therapy of UC, using biomarkers has long been a goal of clinical researchers. We describe the biomarkers assessed in UC, with special reference to acute-phase proteins and serologic markers and thereafter, we describe the new biological markers and the biological markers could be developed in the future: (1) serum markers of acute phase response: The laboratory tests most used to measure the acute-phase proteins in clinical practice are the serum concentration of C-reactive protein and the erythrocyte sedimentation rate. Other biomarkers of inflammation in UC include platelet count, leukocyte count, and serum albumin and serum orosomucoid concentrations; (2) serologic markers/antibodies: In the last decades serological and immunologic biomarkers have been studied extensively in immunology and have been used in clinical practice to detect specific pathologies. In UC, the presence of these antibodies can aid as surrogate markers for the aberrant host immune response; and (3) future biomarkers: The development of biomarkers in UC will be very important in the future. The progress of molecular biology tools (microarrays, proteomics and nanotechnology) have revolutionised the field of the biomarker discovery. The advances in bioinformatics coupled with cross-disciplinary collaborations have greatly enhanced our ability to retrieve, characterize and analyse large amounts of data generated by the technological advances. The techniques available for biomarkers development are genomics (single nucleotide polymorphism genotyping, pharmacogenetics and gene expression analyses) and proteomics. In the future, the addition of new serological markers will add significant benefit. Correlating serologic markers with genotypes and clinical phenotypes should enhance our understanding of pathophysiology of UC.
PMCID: PMC4325297
Inflammatory bowel diseases; Ulcerative colitis; Crohn’s disease; Serologic markers; Acute phase response
20.  Discovery of small molecule cancer drugs: Successes, challenges and opportunities 
Molecular Oncology  2012;6(2):155-176.
The discovery and development of small molecule cancer drugs has been revolutionised over the last decade. Most notably, we have moved from a one-size-fits-all approach that emphasized cytotoxic chemotherapy to a personalised medicine strategy that focuses on the discovery and development of molecularly targeted drugs that exploit the particular genetic addictions, dependencies and vulnerabilities of cancer cells. These exploitable characteristics are increasingly being revealed by our expanding understanding of the abnormal biology and genetics of cancer cells, accelerated by cancer genome sequencing and other high-throughput genome-wide campaigns, including functional screens using RNA interference. In this review we provide an overview of contemporary approaches to the discovery of small molecule cancer drugs, highlighting successes, current challenges and future opportunities. We focus in particular on four key steps: Target validation and selection; chemical hit and lead generation; lead optimization to identify a clinical drug candidate; and finally hypothesis-driven, biomarker-led clinical trials. Although all of these steps are critical, we view target validation and selection and the conduct of biology-directed clinical trials as especially important areas upon which to focus to speed progress from gene to drug and to reduce the unacceptably high attrition rate during clinical development. Other challenges include expanding the envelope of druggability for less tractable targets, understanding and overcoming drug resistance, and designing intelligent and effective drug combinations. We discuss not only scientific and technical challenges, but also the assessment and mitigation of risks as well as organizational, cultural and funding problems for cancer drug discovery and development, together with solutions to overcome the ‘Valley of Death’ between basic research and approved medicines. We envisage a future in which addressing these challenges will enhance our rapid progress towards truly personalised medicine for cancer patients.
► Here we review small molecule cancer drug discovery and development. ► We focus on Target selection, hit identification, lead optimization and clinical trials. ► A particular emphasis of this article is personalized medicine.
PMCID: PMC3476506  PMID: 22440008
Small molecule cancer drug discovery and development; Target and validation selection; Hit identification; Lead optimization and clinical trials; Personalized medicine
21.  LXtoo: an integrated live Linux distribution for the bioinformatics community 
BMC Research Notes  2012;5:360.
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis.
Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing.
LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at
PMCID: PMC3461469  PMID: 22813356
Bioinformatics; Software; Linux; Operating system
22.  Overview of Protein Microarrays 
Protein microarray is an emerging technology that provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput way. Two major classes of protein microarrays are defined to describe their applications: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be fractionated and spotted on a slide to form a reverse-phase protein microarray. While the fabrication technology is maturing, applications of protein microarrays, especially functional protein microarrays, have flourished during the past decade. Here, we will first review recent advances in the protein microarray technologies, and then present a series of examples to illustrate the applications of analytical and functional protein microarrays in both basic and clinical research. The research areas will include detection of various binding properties of proteins, study of protein posttranslational modifications, analysis of host-microbe interactions, profiling antibody specificity, and identification of biomarkers in autoimmune diseases. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade.
PMCID: PMC3680110  PMID: 23546620
protein microarrays; PTM; biomarker; network; systems biology
23.  Applications of Functional Protein Microarrays in Basic and Clinical Research 
Advances in genetics  2012;79:123-155.
The protein microarray technology provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput manner. It is viewed as a new tool that overcomes the limitation of DNA microarrays. On the basis of its application, protein microarrays fall into two major classes: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be directly spotted on a slide to form the so-called “reverse-phase” protein microarray. In the last decade, applications of functional protein microarrays in particular have flourished in studying protein function and construction of networks and pathways. In this chapter, we will review the recent advancements in the protein microarray technology, followed by presenting a series of examples to illustrate the power and versatility of protein microarrays in both basic and clinical research. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade.
PMCID: PMC3790149  PMID: 22989767
24.  Unlocking the Transcriptomes of Two Carcinogenic Parasites, Clonorchis sinensis and Opisthorchis viverrini 
The two parasitic trematodes, Clonorchis sinensis and Opisthorchis viverrini, have a major impact on the health of tens of millions of humans throughout Asia. The greatest impact is through the malignant cancer ( = cholangiocarcinoma) that these parasites induce in chronically infected people. Therefore, both C. sinensis and O. viverrini have been classified by the World Health Organization (WHO) as Group 1 carcinogens. Despite their impact, little is known about these parasites and their interplay with the host at the molecular level. Recent advances in genomics and bioinformatics provide unique opportunities to gain improved insights into the biology of parasites as well as their relationships with their hosts at the molecular level. The present study elucidates the transcriptomes of C. sinensis and O. viverrini using a platform based on next-generation (high throughput) sequencing and advanced in silico analyses. From 500,000 sequences, >50,000 sequences were assembled for each species and categorized as biologically relevant based on homology searches, gene ontology and/or pathway mapping. The results of the present study could assist in defining molecules that are essential for the development, reproduction and survival of liver flukes and/or that are linked to the development of cholangiocarcinoma. This study also lays a foundation for future genomic and proteomic research of C. sinensis and O. viverrini and the cancers that they are known to induce, as well as novel intervention strategies.
Author Summary
The parasitic worms, Clonorchis sinensis and Opisthorchis viverrini, have a serious impact on the health of tens of millions of people throughout Asia. The greatest impact, however, is through the malignant, untreatable cancer (cholangiocarcinoma) that these parasites induce in chronically infected people. These liver flukes are officially classified by the World Health Organization (WHO) as Group 1 carcinogens. In spite of their massive impact on human health, little is known about these parasites and their relationship with the host at the molecular level. Here, we provide the first detailed insight into the transcriptomes of these flukes, providing a solid foundation for all of the molecular/-omic work required to understand their biology, but, more importantly, to elucidate key aspects of the induction of cholangiocarcinoma. Although our focus has been on the parasites, the implications will extend far beyond the study of parasitic disease. Importantly, insights into the pathogenesis of the infection are likely to have major implications for the study and understanding of other cancers.
PMCID: PMC2889816  PMID: 20582164
25.  Expression profiling of drug response - from genes to pathways  
Understanding individual response to a drug -what determines its efficacy and tolerability -is the major bottleneck in current drug development and clinical trials. Intracellular response and metabolism, for example through cytochrome P-450 enzymes, may either enhance or decrease the effect of different drugs, dependent on the genetic variant. Microarrays offer the potential to screen the genetic composition of the individual patient However, experiments are «noisy» and must be accompanied by solid and robust data analysis. Furthermore, recent research aims at the combination of high-throughput data with methods of mathematical modeling, enabling problem-oriented assistance in the drug discovery process. This article will discuss state-of-the-art DNA array technology platforms and the basic elements of data analysis and bioinformatics research in drug discovery. Enhancing single-gene analysis, we will present a new method for interpreting gene expression changes in the context of entire pathways. Furthermore, we will introduce the concept of systems biology as a new paradigm for drug development and highlight our recent research - the development of a modeling and simulation platform for biomedical applications. We discuss the potentials of systems biology for modeling the drug response of the individual patient.
PMCID: PMC3181826  PMID: 17117610
drug discovery; functional genomics; microarray; bioinformatics; data integration; database; systems biology

Results 1-25 (1156463)