Search tips
Search criteria

Results 1-25 (45)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Assessing Proteinase K Resistance of Fish Prion Proteins in a Scrapie-Infected Mouse Neuroblastoma Cell Line 
Viruses  2014;6(11):4398-4421.
The key event in prion pathogenesis is the structural conversion of the normal cellular protein, PrPC, into an aberrant and partially proteinase K resistant isoform, PrPSc. Since the minimum requirement for a prion disease phenotype is the expression of endogenous PrP in the host, species carrying orthologue prion genes, such as fish, could in theory support prion pathogenesis. Our previous work has demonstrated the development of abnormal protein deposition in sea bream brain, following oral challenge of the fish with natural prion infectious material. In this study, we used a prion-infected mouse neuroblastoma cell line for the expression of three different mature fish PrP proteins and the evaluation of the resistance of the exogenously expressed proteins to proteinase K treatment (PK), as an indicator of a possible prion conversion. No evidence of resistance to PK was detected for any of the studied recombinant proteins. Although not indicative of an absolute inability of the fish PrPs to structurally convert to pathogenic isoforms, the absence of PK-resistance may be due to supramolecular and conformational differences between the mammalian and piscine PrPs.
PMCID: PMC4246229  PMID: 25402173
prion; fish; cross-species transmission; cell culture; ScN2a
2.  Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains 
PLoS Biology  2014;12(8):e1001920.
This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains.
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
PMCID: PMC4122341  PMID: 25093819
3.  Functional Genomics Evidence Unearths New Moonlighting Roles of Outer Ring Coat Nucleoporins 
Scientific Reports  2014;4:4655.
There is growing evidence for the involvement of Y-complex nucleoporins (Y-Nups) in cellular processes beyond the inner core of nuclear pores of eukaryotes. To comprehensively assess the range of possible functions of Y-Nups, we delimit their structural and functional properties by high-specificity sequence profiles and tissue-specific expression patterns. Our analysis establishes the presence of Y-Nups across eukaryotes with novel composite domain architectures, supporting new moonlighting functions in DNA repair, RNA processing, signaling and mitotic control. Y-Nups associated with a select subset of the discovered domains are found to be under tight coordinated regulation across diverse human and mouse cell types and tissues, strongly implying that they function in conjunction with the nuclear pore. Collectively, our results unearth an expanded network of Y-Nup interactions, thus supporting the emerging view of the Y-complex as a dynamic protein assembly with diverse functional roles in the cell.
PMCID: PMC3983603  PMID: 24722254
5.  Taxonomic Identification of Mediterranean Pines and Their Hybrids Based on the High Resolution Melting (HRM) and trnL Approaches: From Cytoplasmic Inheritance to Timber Tracing 
PLoS ONE  2013;8(4):e60945.
Fast and accurate detection of plant species and their hybrids using molecular tools will facilitate the assessment and monitoring of local biodiversity in an era of climate and environmental change. Herein, we evaluate the utility of the plastid trnL marker for species identification applied to Mediterranean pines (Pinus spp.). Our results indicate that trnL is a very sensitive marker for delimiting species biodiversity. Furthermore, High Resolution Melting (HRM) analysis was exploited as a molecular fingerprint for fast and accurate discrimination of Pinus spp. DNA sequence variants. The trnL approach and the HRM analyses were extended to wood samples of two species (Pinus nigra and Pinus sylvestris) with excellent results, congruent to those obtained using leaf tissue. Both analyses demonstrate that hybrids from the P. brutia (maternal parent) × P. halepensis (paternal parent) cross, exhibit the P. halepensis profile, confirming paternal plastid inheritance in Group Halepensis pines. Our study indicates that a single one-step reaction method and DNA marker are sufficient for the identification of Mediterranean pines, their hybrids and the origin of pine wood. Furthermore, our results underline the potential for certain DNA regions to be used as novel biological information markers combined with existing morphological characters and suggest a relatively reliable and open taxonomic system that can link DNA variation to phenotype-based species or hybrid assignment status and direct taxa identification from recalcitrant tissues such as wood samples.
PMCID: PMC3618329  PMID: 23577179
6.  Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles 
PLoS ONE  2013;8(1):e52854.
Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.
PMCID: PMC3544837  PMID: 23341912
7.  Experimental evidence validating the computational inference of functional associations from gene fusion events: a critical survey 
Briefings in Bioinformatics  2012;15(3):443-454.
More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.
PMCID: PMC4017328  PMID: 23220349
genome analysis; comparative genomics; gene fusion; protein interactions; proteomics; validation study
8.  Transcriptome classification reveals molecular subtypes in psoriasis 
BMC Genomics  2012;13:472.
Psoriasis is an immune-mediated disease characterised by chronically elevated pro-inflammatory cytokine levels, leading to aberrant keratinocyte proliferation and differentiation. Although certain clinical phenotypes, such as plaque psoriasis, are well defined, it is currently unclear whether there are molecular subtypes that might impact on prognosis or treatment outcomes.
We present a pipeline for patient stratification through a comprehensive analysis of gene expression in paired lesional and non-lesional psoriatic tissue samples, compared with controls, to establish differences in RNA expression patterns across all tissue types. Ensembles of decision tree predictors were employed to cluster psoriatic samples on the basis of gene expression patterns and reveal gene expression signatures that best discriminate molecular disease subtypes. This multi-stage procedure was applied to several published psoriasis studies and a comparison of gene expression patterns across datasets was performed.
Overall, classification of psoriasis gene expression patterns revealed distinct molecular sub-groups within the clinical phenotype of plaque psoriasis. Enrichment for TGFb and ErbB signaling pathways, noted in one of the two psoriasis subgroups, suggested that this group may be more amenable to therapies targeting these pathways. Our study highlights the potential biological relevance of using ensemble decision tree predictors to determine molecular disease subtypes, in what may initially appear to be a homogenous clinical group. The R code used in this paper is available upon request.
PMCID: PMC3481433  PMID: 22971201
Disease classification; Molecular grouping; Psoriasis; Decision tree prediction model
9.  The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence 
Genes  2012;3(2):291-319.
The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.
PMCID: PMC3899948  PMID: 24704919
comparative genomics; pangenome analysis; Chlamydiales; protein family detection; genome annotation; genome trees
10.  Rise and Demise of Bioinformatics? Promise and Progress 
PLoS Computational Biology  2012;8(4):e1002487.
The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.
PMCID: PMC3343106  PMID: 22570600
11.  Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks 
BMC Research Notes  2011;4:462.
Cellular constituents such as proteins, DNA, and RNA form a complex web of interactions that regulate biochemical homeostasis and determine the dynamic cellular response to external stimuli. It follows that detailed understanding of these patterns is critical for the assessment of fundamental processes in cell biology and pathology. Representation and analysis of cellular constituents through network principles is a promising and popular analytical avenue towards a deeper understanding of molecular mechanisms in a system-wide context.
We present Functional Genomics Assistant (FUGA) - an extensible and portable MATLAB toolbox for the inference of biological relationships, graph topology analysis, random network simulation, network clustering, and functional enrichment statistics. In contrast to conventional differential expression analysis of individual genes, FUGA offers a framework for the study of system-wide properties of biological networks and highlights putative molecular targets using concepts of systems biology.
FUGA offers a simple and customizable framework for network analysis in a variety of systems biology applications. It is freely available for individual or academic use at
PMCID: PMC3214203  PMID: 22035155
12.  Complete genome sequence of Mycobacterium sp. strain (Spyr1) and reclassification to Mycobacterium gilvum Spyr1 
Standards in Genomic Sciences  2011;5(1):144-153.
Mycobacterium sp.Spyr1 is a newly isolated strain that occurs in a creosote contaminated site in Greece. It was isolated by an enrichment method using pyrene as sole carbon and energy source and is capable of degrading a wide range of PAH substrates including pyrene, fluoranthene, fluorene, anthracene and acenapthene. Here we describe the genomic features of this organism, together with the complete sequence and annotation. The genome consists of a 5,547,747 bp chromosome and two plasmids, a larger and a smaller one with sizes of 211,864 and 23,681 bp, respectively. In total, 5,588 genes were predicted and annotated.
PMCID: PMC3236039  PMID: 22180818
Mycobacterium gilvum; PAH biodegradation; pyrene degradation
13.  Copy Number and Loss of Heterozygosity Detected by SNP Array of Formalin-Fixed Tissues Using Whole-Genome Amplification 
PLoS ONE  2011;6(9):e24503.
The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM), and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE). Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH) analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be ‘missed’, only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used to increase power and compensate for increased error rates.
PMCID: PMC3180289  PMID: 21966361
14.  Protein coalitions in a core mammalian biochemical network linked by rapidly evolving proteins 
Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation.
We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes.
Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.
PMCID: PMC3112093  PMID: 21612628
15.  Complete genome sequence of Arthrobacter phenanthrenivorans type strain (Sphe3) 
Standards in Genomic Sciences  2011;4(2):123-130.
Arthrobacter phenanthrenivorans is the type species of the genus, and is able to metabolize phenanthrene as a sole source of carbon and energy. A. phenanthrenivorans is an aerobic, non-motile, and Gram-positive bacterium, exhibiting a rod-coccus growth cycle which was originally isolated from a creosote polluted site in Epirus, Greece. Here we describe the features of this organism, together with the complete genome sequence, and annotation.
PMCID: PMC3111998  PMID: 21677849
Arthrobacter; dioxygenases; PAH biodegradation; phenanthrene degradation
16.  A Systems Model for Immune Cell Interactions Unravels the Mechanism of Inflammation in Human Skin 
PLoS Computational Biology  2010;6(12):e1001024.
Inflammation is characterized by altered cytokine levels produced by cell populations in a highly interdependent manner. To elucidate the mechanism of an inflammatory reaction, we have developed a mathematical model for immune cell interactions via the specific, dose-dependent cytokine production rates of cell populations. The model describes the criteria required for normal and pathological immune system responses and suggests that alterations in the cytokine production rates can lead to various stable levels which manifest themselves in different disease phenotypes. The model predicts that pairs of interacting immune cell populations can maintain homeostatic and elevated extracellular cytokine concentration levels, enabling them to operate as an immune system switch. The concept described here is developed in the context of psoriasis, an immune-mediated disease, but it can also offer mechanistic insights into other inflammatory pathologies as it explains how interactions between immune cell populations can lead to disease phenotypes.
Author Summary
A functional immune system requires complex interactions among diverse cell types, mediated by a variety of cytokines. These interactions include phenomena such as positive and negative feedback loops that can be experimentally characterized by dose-dependent cytokine production measurements. However, any experimental approach is not only limited with regard to the number of cell-cell interactions that can be studied at a given time, but also does not have the capacity to assess or predict the overall immune response which is the result of complex interdependent immune cell interactions. Therefore, experimental data need to be viewed from a theoretical perspective allowing the quantitative modeling of immune cell interactions. Here, we propose a strategy for a quantitative description of multiple interactions between immune cell populations based on their cytokine production profiles. The model predicts that the modified feedback loop interactions can result in the appearance of alternative steady-states causing the switch-like immune system effect that is experimentally observed in pathologic phenotypes. Overall, the quantitative description of immune cell interactions via cytokine signaling reported here offers new insights into understanding and predicting normal and pathological immune system responses.
PMCID: PMC2996319  PMID: 21152006
17.  Genome-wide expression patterns in physiological cardiac hypertrophy 
BMC Genomics  2010;11:557.
Genome-wide expression patterns in physiological cardiac hypertrophy. Co-expression patterns in physiological cardiac hypertrophy
In this study, the first large-scale analysis of publicly available genome-wide expression data of several in vivo murine models of physiological LVH was carried out using network analysis. On evaluating 3 million gene co-expression patterns across 141 relevant microarray experiments, it was found that physiological adaptation is an evolutionarily conserved processes involving preservation of the function of cytochrome c oxidase, induction of autophagy compatible with cell survival, and coordinated regulation of angiogenesis.
This analysis not only identifies known biological pathways involved in physiological LVH, but also offers novel insights into the molecular basis of this phenotype by identifying key networks of co-expressed genes, as well as their topological and functional properties, using relevant high-quality microarray experiments and network inference.
PMCID: PMC3091706  PMID: 20937113
18.  Promoter Complexity and Tissue-Specific Expression of Stress Response Components in Mytilus galloprovincialis, a Sessile Marine Invertebrate Species 
PLoS Computational Biology  2010;6(7):e1000847.
The mechanisms of stress tolerance in sessile animals, such as molluscs, can offer fundamental insights into the adaptation of organisms for a wide range of environmental challenges. One of the best studied processes at the molecular level relevant to stress tolerance is the heat shock response in the genus Mytilus. We focus on the upstream region of Mytilus galloprovincialis Hsp90 genes and their structural and functional associations, using comparative genomics and network inference. Sequence comparison of this region provides novel evidence that the transcription of Hsp90 is regulated via a dense region of transcription factor binding sites, also containing a region with similarity to the Gamera family of LINE-like repetitive sequences and a genus-specific element of unknown function. Furthermore, we infer a set of gene networks from tissue-specific expression data, and specifically extract an Hsp class-associated network, with 174 genes and 2,226 associations, exhibiting a complex pattern of expression across multiple tissue types. Our results (i) suggest that the heat shock response in the genus Mytilus is regulated by an unexpectedly complex upstream region, and (ii) provide new directions for the use of the heat shock process as a biosensor system for environmental monitoring.
Author Summary
Adaptation of sessile animals, such as molluscs, to stress is achieved by a number of molecular mechanisms, few of which are clearly understood. Insights from this research can provide clues about stress tolerance both for sessile and mobile organisms. The Mediterranean mussel, of the genus Mytilus, is a model organism for the study of stress at the molecular level, with sufficient gene structure and function data available. We have thus investigated a key stress response gene, Hsp90, and in particular its upstream region, using a combination of sequence and expression analysis approaches. We demonstrate that this region, responsible for the regulation of heat shock-associated gene expression, exhibits an unparalleled structural and functional complexity compared to other model organisms, as well as subtle gene expression patterns across multiple tissues. These results form the basis upon which the heat shock response can be used as a molecular biosensor for environmental monitoring in the future.
PMCID: PMC2900285  PMID: 20628614
19.  Tumorigenic Properties of Iron Regulatory Protein 2 (IRP2) Mediated by Its Specific 73-Amino Acids Insert 
PLoS ONE  2010;5(4):e10163.
Iron regulatory proteins, IRP1 and IRP2, bind to mRNAs harboring iron responsive elements and control their expression. IRPs may also perform additional functions. Thus, IRP1 exhibited apparent tumor suppressor properties in a tumor xenograft model. Here we examined the effects of IRP2 in a similar setting. Human H1299 lung cancer cells or clones engineered for tetracycline-inducible expression of wild type IRP2, or the deletion mutant IRP2Δ73 (lacking a specific insert of 73 amino acids), were injected subcutaneously into nude mice. The induction of IRP2 profoundly stimulated the growth of tumor xenografts, and this response was blunted by addition of tetracycline in the drinking water of the animals, to turnoff the IRP2 transgene. Interestingly, IRP2Δ73 failed to promote tumor growth above control levels. As expected, xenografts expressing the IRP2 transgene exhibited high levels of transferrin receptor 1 (TfR1); however, the expression of other known IRP targets was not affected. Moreover, these xenografts manifested increased c-MYC levels and ERK1/2 phosphorylation. A microarray analysis identified distinct gene expression patterns between control and tumors containing IRP2 or IRP1 transgenes. By contrast, gene expression profiles of control and IRP2Δ73-related tumors were more similar, consistently with their growth phenotype. Collectively, these data demonstrate an apparent pro-oncogenic activity of IRP2 that depends on its specific 73 amino acids insert, and provide further evidence for a link between IRPs and cancer biology.
PMCID: PMC2854138  PMID: 20405006
20.  Sequence-based feature prediction and annotation of proteins 
Genome Biology  2009;10(2):206.
The combination of prediction tools in complex workflows and pipelines facilitates prediction of protein features from sequence.
A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.
PMCID: PMC2688272  PMID: 19226438
21.  Stratification of co-evolving genomic groups using ranked phylogenetic profiles 
BMC Bioinformatics  2009;10:355.
Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database.
The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples.
Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
PMCID: PMC2775751  PMID: 19860884
22.  Emergence, development and diversification of the TGF-β signalling pathway within the animal kingdom 
The question of how genomic processes, such as gene duplication, give rise to co-ordinated organismal properties, such as emergence of new body plans, organs and lifestyles, is of importance in developmental and evolutionary biology. Herein, we focus on the diversification of the transforming growth factor-β (TGF-β) pathway – one of the fundamental and versatile metazoan signal transduction engines.
After an investigation of 33 genomes, we show that the emergence of the TGF-β pathway coincided with appearance of the first known animal species. The primordial pathway repertoire consisted of four Smads and four receptors, similar to those observed in the extant genome of the early diverging tablet animal (Trichoplax adhaerens). We subsequently retrace duplications in ancestral genomes on the lineage leading to humans, as well as lineage-specific duplications, such as those which gave rise to novel Smads and receptors in teleost fishes. We conclude that the diversification of the TGF-β pathway can be parsimoniously explained according to the 2R model, with additional rounds of duplications in teleost fishes. Finally, we investigate duplications followed by accelerated evolution which gave rise to an atypical TGF-β pathway in free-living bacterial feeding nematodes of the genus Rhabditis.
Our results challenge the view of well-conserved developmental pathways. The TGF-β signal transduction engine has expanded through gene duplication, continually adopting new functions, as animals grew in anatomical complexity, colonized new environments, and developed an active immune system.
PMCID: PMC2657120  PMID: 19192293
23.  Metabolic innovations towards the human lineage 
We describe a function-driven approach to the analysis of metabolism which takes into account the phylogenetic origin of biochemical reactions to reveal subtle lineage-specific metabolic innovations, undetectable by more traditional methods based on sequence comparison. The origins of reactions and thus entire pathways are inferred using a simple taxonomic classification scheme that describes the evolutionary course of events towards the lineage of interest. We investigate the evolutionary history of the human metabolic network extracted from a metabolic database, construct a network of interconnected pathways and classify this network according to the taxonomic categories representing eukaryotes, metazoa and vertebrates.
It is demonstrated that lineage-specific innovations correspond to reactions and pathways associated with key phenotypic changes during evolution, such as the emergence of cellular organelles in eukaryotes, cell adhesion cascades in metazoa and the biosynthesis of complex cell-specific biomolecules in vertebrates.
This phylogenetic view of metabolic networks puts gene innovations within an evolutionary context, demonstrating how the emergence of a phenotype in a lineage provides a platform for the development of specialized traits.
PMCID: PMC2553087  PMID: 18782449
24.  Denoising inferred functional association networks obtained by gene fusion analysis 
BMC Genomics  2007;8:460.
Gene fusion detection – also known as the 'Rosetta Stone' method – involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes.
In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions.
We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function.
PMCID: PMC2248599  PMID: 18081932
25.  CORRIE: enzyme sequence annotation with confidence estimates 
BMC Bioinformatics  2007;8(Suppl 4):S3.
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: .
PMCID: PMC1892082  PMID: 17570146

Results 1-25 (45)