Summary: Bacterial plasmids are self-replicating, extrachromosomal elements that are key agents of change in microbial populations. They promote the dissemination of a variety of traits, including virulence, enhanced fitness, resistance to antimicrobial agents, and metabolism of rare substances. Escherichia coli, perhaps the most studied of microorganisms, has been found to possess a variety of plasmid types. Included among these are plasmids associated with virulence. Several types of E. coli virulence plasmids exist, including those essential for the virulence of enterotoxigenic E. coli, enteroinvasive E. coli, enteropathogenic E. coli, enterohemorrhagic E. coli, enteroaggregative E. coli, and extraintestinal pathogenic E. coli. Despite their diversity, these plasmids belong to a few plasmid backbones that present themselves in a conserved and syntenic manner. Thanks to some recent research, including sequence analysis of several representative plasmid genomes and molecular pathogenesis studies, the evolution of these virulence plasmids and the implications of their acquisition by E. coli are now better understood and appreciated. Here, work involving each of the E. coli virulence plasmid types is summarized, with the available plasmid genomic sequences for several E. coli pathotypes being compared in an effort to understand the evolution of these plasmid types and define their core and accessory components.
Virulence factor database (VFDB) was set up in 2004 dedicated for providing current knowledge of virulence factors (VFs) from various medical significant bacterial pathogens to facilitate pathogenomic research. Nowadays, complete genome sequences of almost all the major pathogenic microbes have been determined, which makes comparative genomics a powerful approach for uncovering novel virulence determinants and hidden aspects of pathogenesis. VFDB was therefore upgraded to present the enormous diversity of bacterial genomes in terms of virulence genes and their organization. The VFDB 2008 release includes the following new features; (i) detailed tabular comparison of virulence composition of a given genome with other genomes of the same genus, (ii) multiple alignments and statistical analysis of homologous VFs and (iii) graphical comparison of genomic organizations of virulence genes. Comparative analysis of the numerous VFs will improve our understanding of the nature and evolution of virulence, as well as the development of new therapeutic and preventive strategies. VFDB 2008 release offers more user-friendly tools for comparative pathogenomics and it is publicly accessible at http://www.mgc.ac.cn/VFs/.
Comparative analyses of pathogen genomes provide new insights into how pathogens have evolved common and divergent virulence strategies to invade related plant species. Fusarium crown and root rots are important diseases of wheat and barley world-wide. In Australia, these diseases are primarily caused by the fungal pathogen Fusarium pseudograminearum. Comparative genomic analyses showed that the F. pseudograminearum genome encodes proteins that are present in other fungal pathogens of cereals but absent in non-cereal pathogens. In some cases, these cereal pathogen specific genes were also found in bacteria associated with plants. Phylogenetic analysis of selected F. pseudograminearum genes supported the hypothesis of horizontal gene transfer into diverse cereal pathogens. Two horizontally acquired genes with no previously known role in fungal pathogenesis were studied functionally via gene knockout methods and shown to significantly affect virulence of F. pseudograminearum on the cereal hosts wheat and barley. Our results indicate using comparative genomics to identify genes specific to pathogens of related hosts reveals novel virulence genes and illustrates the importance of horizontal gene transfer in the evolution of plant infecting fungal pathogens.
Cereals are our most important staple crops and are subject to attack from a diverse range of fungal pathogens. A major goal of molecular plant pathology research is to understand how pathogens infect plants to allow the development of durable plant protection measures. Comparing the genomes of different pathogens of cereals and contrasting them to non-cereal pathogen genomes allows for the identification of genes important for pathogenicity toward these important crops. In this study, we sequenced the genome of the wheat and barley pathogen F. pseudograminearum responsible for crown and root-rot diseases, and compared it to those from a broad range of previously sequenced fungal genomes from cereal and non-cereal pathogens. These analyses revealed that the F. pseudograminearum genome contains a number of genes only found in fungi pathogenic on cereals. Some of these genes appear to have been horizontally acquired from other fungi and, in some cases, from plant associated bacteria. The functions of two of these genes were tested by creating strains that lacked the genes. Both genes had important roles in causing disease on cereals. This work has important implications for our understanding of pathogen specialization during the evolution of fungal pathogens infecting cereal crops.
The Phytophthora Genome Initiative (PGI) is a distributed collaboration to study the genome and evolution of a particularly destructive group of plant pathogenic oomycete, with the goal of understanding the mechanisms of infection and resistance. NCGR provides informatics support for the collaboration as well as a centralized data repository. In the pilot phase of the project, several investigators prepared Phytophthora infestans and Phytophthora sojae EST and Phytophthora sojae BAC libraries and sent them to another laboratory for sequencing. Data from sequencing reactions were transferred to NCGR for analysis and curation. An analysis pipeline transforms raw data by performing simple analyses (i.e., vector removal and similarity searching) that are stored and can be retrieved by investigators using a web browser. Here we describe the database and access tools, provide an overview of the data therein and outline future plans. This resource has provided a unique opportunity for the distributed, collaborative study of a genus from which relatively little sequence data are available. Results may lead to insight into how better to control these pathogens. The homepage of PGI can be accessed at http:www.ncgr.org/pgi , with database access through the database access hyperlink.
Fish living in the wild as well as reared in the aquaculture facilities are susceptible to infectious diseases caused by a phylogenetically diverse collection of bacterial pathogens. Control and treatment options using vaccines and drugs are either inadequate, inefficient, or impracticable. The classical approach in studying fish bacterial pathogens has been looking at individual or few virulence factors. Recently, genome sequencing of a number of bacterial fish pathogens has tremendously increased our understanding of the biology, host adaptation, and virulence factors of these important pathogens. This paper attempts to compile the scattered literature on genome sequence information of fish pathogenic bacteria published and available to date. The genome sequencing has uncovered several complex adaptive evolutionary strategies mediated by horizontal gene transfer, insertion sequence elements, mutations and prophage sequences operating in fish pathogens, and how their genomes evolved from generalist environmental strains to highly virulent obligatory pathogens. In addition, the comparative genomics has allowed the identification of unique pathogen-specific gene clusters. The paper focuses on the comparative analysis of the virulogenomes of important fish bacterial pathogens, and the genes involved in their evolutionary adaptation to different ecological niches. The paper also proposes some new directions on finding novel vaccine and chemotherapeutic targets in the genomes of bacterial pathogens of fish.
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.
Leptospirosis is one of the most common diseases transmitted by animals worldwide. It is important because it causes an often lethal febrile illnesses in tropical and subtropical areas associated with poor sanitation and agriculture. Leptospirosis may be epidemic, associated with natural disasters and flooding, or endemic in tropical regions. It is unknown how Leptospira cause disease and why different strains cause different severity of illness. In this study we attenuated (weakened) a highly virulent strain of L. interrogans by culturing it in vitro over several months. Comparison of the whole genome sequence before and after the attenuation process revealed a small set of genes that were mutated, and therefore associated with virulence. We discovered a putative soluble adenylate cyclase with host cell cAMP elevating activity, with implications for immune evasion and a new gene family that is upregulated in vivo during acute hamster infection. Interestingly, both Bartonella bacilliformis and Bartonella australis also have this unique gene family we describe in pathogenic Leptospira. This information aids in our understanding of Leptospira evolution and pathogenesis.
The mosquito Culex quinquefasciatus poses a significant threat to human and veterinary health as a primary vector of West Nile virus (WNV), the filarial worm Wuchereria bancrofti, and an avian malaria parasite. Comparative phylogenomics revealed an expanded canonical C. quinquefasciatus immune gene repertoire compared with those of Aedes aegypti and Anopheles gambiae. Transcriptomic analysis of C. quinquefasciatus genes responsive to WNV, W. bancrofti and non-native bacteria facilitated an unprecedented meta-analysis of 25 vector-pathogen interactions involving arboviruses, filarial worms, bacteria and malaria parasites, revealing common and distinct responses to these pathogen types in three mosquito genera. Our findings provide support for the hypothesis that mosquito-borne pathogens have evolved to evade innate immune responses in three vector mosquito species of major medical importance.
Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are closely related gram-positive, spore-forming bacteria of the B. cereus sensu lato group. While independently derived strains of B. anthracis reveal conspicuous sequence homogeneity, environmental isolates of B. cereus and B. thuringiensis exhibit extensive genetic diversity. Here we report the sequencing and comparative analysis of the genomes of two members of the B. cereus group, B. thuringiensis 97-27 subsp. konkukian serotype H34, isolated from a necrotic human wound, and B. cereus E33L, which was isolated from a swab of a zebra carcass in Namibia. These two strains, when analyzed by amplified fragment length polymorphism within a collection of over 300 of B. cereus, B. thuringiensis, and B. anthracis isolates, appear closely related to B. anthracis. The B. cereus E33L isolate appears to be the nearest relative to B. anthracis identified thus far. Whole-genome sequencing of B. thuringiensis 97-27and B. cereus E33L was undertaken to identify shared and unique genes among these isolates in comparison to the genomes of pathogenic strains B. anthracis Ames and B. cereus G9241 and nonpathogenic strains B. cereus ATCC 10987 and B. cereus ATCC 14579. Comparison of these genomes revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms.
Salmonella enterica serovar Typhi is a human pathogen that causes typhoid fever predominantly in developing countries. In this article, we describe the whole genome sequence of the S. Typhi strain CR0044 isolated from a typhoid fever carrier in Kelantan, Malaysia. These data will further enhance the understanding of its host persistence and adaptive mechanism.
Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from other microbes and are important in evolution of pathogenesis. In this study, a comprehensive database and search engines specialized for PAIs were established. The pathogenicity island database (PAIDB) is a comprehensive relational database of all the reported PAIs and potential PAI regions which were predicted by a method that combines feature-based analysis and similarity-based analysis. Also, using the PAI Finder search application, a multi-sequence query can be analyzed onsite for the presence of potential PAIs. As of April 2006, PAIDB contains 112 types of PAIs and 889 GenBank accessions containing either partial or all PAI loci previously reported in the literature, which are present in 497 strains of pathogenic bacteria. The database also offers 310 candidate PAIs predicted from 118 sequenced prokaryotic genomes. With the increasing number of prokaryotic genomes without functional inference and sequenced genetic regions of suspected involvement in diseases, this web-based, user-friendly resource has the potential to be of significant use in pathogenomics. PAIDB is freely accessible at .
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis.
Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing.
LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
Bioinformatics; Software; Linux; Operating system
The protein microarray technology provides a versatile platform for characterization of hundreds of thousands of proteins in a highly parallel and high-throughput manner. It is viewed as a new tool that overcomes the limitation of DNA microarrays. On the basis of its application, protein microarrays fall into two major classes: analytical and functional protein microarrays. In addition, tissue or cell lysates can also be directly spotted on a slide to form the so-called “reverse-phase” protein microarray. In the last decade, applications of functional protein microarrays in particular have flourished in studying protein function and construction of networks and pathways. In this chapter, we will review the recent advancements in the protein microarray technology, followed by presenting a series of examples to illustrate the power and versatility of protein microarrays in both basic and clinical research. As a powerful technology platform, it would not be surprising if protein microarrays will become one of the leading technologies in proteomic and diagnostic fields in the next decade.
Asthma is characterized by lung inflammation caused by complex interaction between the immune system and environmental factors such as allergens and inorganic pollutants. Recent research in this field is focused on discovering new biomarkers associated with asthma pathogenesis. This review illustrates updated research associating biomarkers of allergic asthma and their potential use in systems biology of the disease. We focus on biomolecules with altered expression, which may serve as inflammatory, diagnostic and therapeutic biomarkers of asthma discovered in human or experimental asthma model using genomic, proteomic and epigenomic approaches for gene and protein expression profiling. These include high-throughput technologies such as state of the art microarray and proteomics Mass Spectrometry (MS) platforms. Emerging concepts of molecular interactions and pathways may provide new insights in searching potential clinical biomarkers. We summarized certain pathways with significant linkage to asthma pathophysiology by analyzing the compiled biomarkers. Systems approaches with this data can identify the regulating networks, which will eventually identify the key biomarkers to be used for diagnostics and drug discovery.
allergic asthma; biomarker; DAAB; TH-2 cytokines and ROS pathway
Immunology research has been transformed in the post-genomics era, with high throughput molecular biology and information technologies taking an increasingly central role. This has led to the development of a new area of science termed "Immunomics", that encompasses genomic, high throughput and bioinformatic approaches to immunology. In recognition of the increasing importance of this field, Immunome Research is a new Open Access, online journal, that will publish cutting edge research across the field of Immunomics. Immunome Research will publish a wide range of article types including specialty immunology databases, immunology database tools, immunome epitope research, epitope analysis tools, high-throughput technologies (gene sequencing, microarrays, proteomics), white papers, mathematical and theoretical models, and prediction tools. Immunome Research is the official journal of the International Immunomics Society (IIMMS).
Since the advent of the new proteomics era more than a decade ago, large-scale studies of protein profiling have been exploited to identify the distinctive molecular signatures in a wide array of biological systems spanning areas of basic biological research, various disease states, and biomarker discovery directed toward therapeutic applications. Recent advances in protein separation and identification techniques have significantly improved proteomics approaches, leading to enhancement of the depth and breadth of proteome coverage. Proteomic signatures specific for invasive lung cancer and preinvasive lesions have begun to emerge. In this review we provide a critical assessment of the state of recent advances in proteomic approaches and the biological lessons they have yielded, with specific emphasis on the discovery of biomarker signatures for the early detection of lung cancer.
proteomics; biomarker; early detection; lung cancer
The desire for biomarkers for diagnosis and prognosis of diseases has never been greater. With the availability of genome data and an increased availability of proteome data, the discovery of biomarkers has become increasingly feasible. However, the task is daunting and requires collaborations among researchers working in the fields of transplantation, immunology, genetics, molecular biology, biostatistics, and bioinformatics. With the advancement of high throughput omic techniques such as genomics and proteomics (collectively known as proteogenomics), efforts have been made to develop diagnostic tools from new and to-be discovered biomarkers. Yet biomarker validation, particularly in organ transplantation, remains challenging because of the lack of a true gold standard for diagnostic categories and analytical bottlenecks that face high-throughput data deconvolution. Even though microarray technique is relatively mature, proteomics is still growing with regards to data normalization and analysis methods. Study design, sample selection, and rigorous data analysis are the critical issues for biomarker discovery using high-throughout proteogenomic technologies that combine the use and strengths of both genomics and proteomics. In this review, we look into the current status and latest developments in the field of biomarker discovery using genomics and proteomics related to organ transplantation, with an emphasis on the evolution of proteomic technologies.
Biomarker discovery; proteogenomics; genomics; proteomics; microarray; transplantation; acute rejection; peptidomics
The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.
Archaea display amazing physiological properties that are of interest to understand at the molecular level including the ability to thrive at extreme environmental conditions, the presence of novel metabolic pathways (e.g., methanogenesis, methylaspartate cycle) and the use of eukaryotic-like protein machineries for basic cellular functions. Coupling traditional genetic and biochemical approaches with advanced technologies, such as genomics and proteomics, provides an avenue for scientists to discover new aspects related to the molecular physiology of archaea. This review emphasizes the unusual properties of archaeal proteomes and how high-throughput and specialized mass spectrometry-based proteomic studies have provided insight into the molecular properties of archaeal cells.
Group B Streptococcus commonly colonises healthy adults without symptoms, yet under certain circumstances displays the ability to invade host tissues, evade immune detection and cause serious invasive disease. Consequently, Group B Streptococcus remains a leading cause of neonatal pneumonia, sepsis and meningitis. Here we review recent information on the bacterial factors and mechanisms that direct host–pathogen interactions involved in the pathogenesis of Group B Streptococcus infection. New research on host signalling and inflammatory responses to Group B Streptococcus infection is summarised. An understanding of the complex interplay between Group B Streptococcus and host provides valuable insight into pathogen evolution and highlights molecular targets for therapeutic intervention.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
bioinformatics; training; bioinformatics courses; training life scientists; train the trainers
High throughput sequencing is now fast and cheap enough to be considered part of the toolbox for investigating bacteria, and there are thousands of bacterial genome sequences available for comparison in the public domain. Bacterial genome analysis is increasingly being performed by diverse groups in research, clinical and public health labs alike, who are interested in a wide array of topics related to bacterial genetics and evolution. Examples include outbreak analysis and the study of pathogenicity and antimicrobial resistance. In this beginner’s guide, we aim to provide an entry point for individuals with a biology background who want to perform their own bioinformatics analysis of bacterial genome data, to enable them to answer their own research questions. We assume readers will be familiar with genetics and the basic nature of sequence data, but do not assume any computer programming skills. The main topics covered are assembly, ordering of contigs, annotation, genome comparison and extracting common typing information. Each section includes worked examples using publicly available E. coli data and free software tools, all which can be performed on a desktop computer.
Bacterial; Microbial; Comparative; Genomics; Next generation sequencing; Analysis; Methods
Understanding individual response to a drug -what determines its efficacy and tolerability -is the major bottleneck in current drug development and clinical trials. Intracellular response and metabolism, for example through cytochrome P-450 enzymes, may either enhance or decrease the effect of different drugs, dependent on the genetic variant. Microarrays offer the potential to screen the genetic composition of the individual patient However, experiments are «noisy» and must be accompanied by solid and robust data analysis. Furthermore, recent research aims at the combination of high-throughput data with methods of mathematical modeling, enabling problem-oriented assistance in the drug discovery process. This article will discuss state-of-the-art DNA array technology platforms and the basic elements of data analysis and bioinformatics research in drug discovery. Enhancing single-gene analysis, we will present a new method for interpreting gene expression changes in the context of entire pathways. Furthermore, we will introduce the concept of systems biology as a new paradigm for drug development and highlight our recent research - the development of a modeling and simulation platform for biomedical applications. We discuss the potentials of systems biology for modeling the drug response of the individual patient.
drug discovery; functional genomics; microarray; bioinformatics; data integration; database; systems biology
The recent development of microarray technology provided unprecedented opportunities to understand the genetic basis of aging. So far, many microarray studies have addressed aging-related expression patterns in multiple organisms and under different conditions. The number of relevant studies continues to increase rapidly. However, efficient exploitation of these vast data is frustrated by the lack of an integrated data mining platform or other unifying bioinformatic resource to enable convenient cross-laboratory searches of array signals. To facilitate the integrative analysis of microarray data on aging, we developed a web database and analysis platform ‘Gene Aging Nexus’ (GAN) that is freely accessible to the research community to query/analyze/visualize cross-platform and cross-species microarray data on aging. By providing the possibility of integrative microarray analysis, GAN should be useful in building the systems-biology understanding of aging. GAN is accessible at .
Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise.
An important task in biomedical research is identifying biomarkers that correlate with patient clinical data, and these biomarkers then provide a critical foundation for the diagnosis and treatment of disease. Conventionally, such an analysis is based on individual genes, but the results are often noisy and difficult to interpret. Using a biological network as the searching platform, network-based biomarkers are expected to be more robust and provide deep insights into the molecular mechanisms of disease. We have developed a novel bioinformatics web server for identifying network-based biomarkers that most correlate with patient survival data, SurvNet. The web server takes three input files: one biological network file, representing a gene regulatory or protein interaction network; one molecular profiling file, containing any type of gene- or protein-centred high-throughput biological data (e.g. microarray expression data or DNA methylation data); and one patient survival data file (e.g. patients’ progression-free survival data). Given user-defined parameters, SurvNet will automatically search for subnetworks that most correlate with the observed patient survival data. As the output, SurvNet will generate a list of network biomarkers and display them through a user-friendly interface. SurvNet can be accessed at http://bioinformatics.mdanderson.org/main/SurvNet.